Commit Graph

21689 Commits

Author SHA1 Message Date
Quentin Colombet 52ffa6711b [RAGreedy] Empty live-ranges always succeed in last chance recoloring.
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.

Unfortunately no test case for in-tree targets.

llvm-svn: 284152
2016-10-13 19:27:48 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Simon Pilgrim cb59b5257c [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines
llvm-svn: 284122
2016-10-13 14:04:35 +00:00
Simon Pilgrim fa8fadc0e5 [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding
llvm-svn: 284117
2016-10-13 12:49:31 +00:00
Simon Pilgrim 833b8a2071 [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization
Improves commutation potential

llvm-svn: 284113
2016-10-13 12:05:20 +00:00
Krzysztof Parzyszek abc0662f04 Handle lane masks in LivePhysRegs when adding live-ins
Differential Revision: https://reviews.llvm.org/D25533

llvm-svn: 284076
2016-10-12 22:53:41 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Krzysztof Parzyszek d62669d791 [MIRParser] Parse lane masks for register live-ins
Differential Revision: https://reviews.llvm.org/D25530

llvm-svn: 284052
2016-10-12 21:06:45 +00:00
Krzysztof Parzyszek 8271be9a1d Do not remove implicit defs in BranchFolder
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.

Differential Revision: https://reviews.llvm.org/D25478

llvm-svn: 284036
2016-10-12 19:50:57 +00:00
Matt Arsenault 691efe03bd BranchRelaxation: Unique live ins when creating block
llvm-svn: 284018
2016-10-12 15:32:04 +00:00
Simon Pilgrim 08190943cb [DAGCombiner] Update most ADD combines to support general vector combines
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.

Differential Revision: https://reviews.llvm.org/D25374

llvm-svn: 284015
2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov 081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Michael Kuperstein 7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Tim Northover 60cf6fc58f MIRParser: allow types on registers with a RegBank.
This fixes some GlobalISel regression tests.

llvm-svn: 283936
2016-10-11 20:50:04 +00:00
Kyle Butt 0846e56e63 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283934
2016-10-11 20:36:43 +00:00
Arnold Schwaighofer 9103e268cf Silence -Wunused-but-set-variable warning
llvm-svn: 283927
2016-10-11 19:49:29 +00:00
Sanjay Patel 8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel 8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel 38a42e4bfa [DAG] simplify logic; NFC
llvm-svn: 283885
2016-10-11 14:14:30 +00:00
Sanjay Patel 907ae69125 [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 283884
2016-10-11 14:04:24 +00:00
Sanjay Patel 9609f3d6c7 [DAG] fix formatting; NFC
llvm-svn: 283878
2016-10-11 13:47:43 +00:00
Fraser Cormack 48d9fdc1e1 Fix formatting in findRegisterUseOperandIdx. NFC.
llvm-svn: 283860
2016-10-11 09:09:21 +00:00
Daniel Jasper 0c42dc4784 Revert "Codegen: Tail-duplicate during placement."
This reverts commit r283842.

test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.

llvm-svn: 283857
2016-10-11 07:36:11 +00:00
Matthias Braun e67fc4a328 Fix warning; NFC
llvm-svn: 283851
2016-10-11 04:32:03 +00:00
Matthias Braun 3d85ebe5b1 MIRParser: generic register operands with types
This should fix the fallout of r283848.

llvm-svn: 283850
2016-10-11 04:22:29 +00:00
Matthias Braun 74ad41c7cd MIRParser: Rewrite register info initialization; mostly NFC
This changes MachineRegisterInfo to be initializes after parsing all
instructions. This is in preparation for upcoming commits that allow the
register class specification on the operand or deduce them from the
MCInstrDesc.

This commit removes the unused feature of having nonsequential register
numbers. This was confusing anyway as the vreg numbers would be
different after parsing when you had "holes" in your numbering.

This patch also introduces the concept of an incomplete virtual
register. An incomplete virtual register may be used during .mir parsing
to construct MachineOperands without knowing the exact register class
(or register bank) yet.

NFC except for some error messages.

Differential Revision: https://reviews.llvm.org/D22397

llvm-svn: 283848
2016-10-11 03:13:01 +00:00
Kyle Butt ae068a320c Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283842
2016-10-11 01:20:33 +00:00
Dylan McKay c328fe5af4 [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 283838
2016-10-11 01:04:36 +00:00
Tim Northover bdf1624367 GlobalISel: select G_GLOBAL_VALUE uses on AArch64.
llvm-svn: 283809
2016-10-10 21:50:00 +00:00
Hal Finkel fcd2421667 [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.

Patch by Edward Jones, thanks!

Differential Revision: https://reviews.llvm.org/D24459

llvm-svn: 283797
2016-10-10 20:45:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Peter Collingbourne 5c924d7117 Target: Remove unused entities.
llvm-svn: 283690
2016-10-09 04:38:57 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Kyle Butt 2facd194a2 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.

llvm-svn: 283647
2016-10-08 01:47:05 +00:00
Kyle Butt 37e676d857 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283619
2016-10-07 22:33:20 +00:00
Arnold Schwaighofer 3f25658143 swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.

Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.

Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.

When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.

This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.

rdar://28300923

llvm-svn: 283617
2016-10-07 22:06:55 +00:00
Sanjay Patel 14c02052d6 [DAG] clean up foldSelectOfConstants(); NFCI
Rename variables, simplify logic. 
Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too.

llvm-svn: 283613
2016-10-07 21:55:42 +00:00
Sanjay Patel ecaf343fe7 [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC
We're missing at least 3 other similar folds based on what we have in InstCombine. 

llvm-svn: 283596
2016-10-07 20:47:51 +00:00
Dehao Chen 6e0c8446db Invoke add-discriminator at -g0 -fsample-profile
Summary: -fsample-profile needs discriminator, which will not be added if built with -g0. This patch makes sure the discriminator is added for sample-profile at -g0. A followup patch will be send out to update clang tests.

Reviewers: davidxl, dblaikie, echristo, dnovillo

Subscribers: mehdi_amini, probinson, llvm-commits

Differential Revision: https://reviews.llvm.org/D25132

llvm-svn: 283565
2016-10-07 15:21:31 +00:00
Krzysztof Parzyszek e513e17b23 Only track physical registers in LivePhysRegs
llvm-svn: 283561
2016-10-07 14:50:49 +00:00
Vedant Kumar 7beb423765 Delete some dead code in SelectionDAG (NFC)
Differential Revision: https://reviews.llvm.org/D24435

llvm-svn: 283505
2016-10-06 22:53:43 +00:00
Wolfgang Pieb e51bede1d8 Preserve the debug location when CodeGenPrepare sinks a compare instruction into the
basic block of a user.

Patch by Andrea DiBiagio.

Differential Revision: https://reviews.llvm.org/D24632

llvm-svn: 283500
2016-10-06 21:43:45 +00:00
Pirama Arumuga Nainar cc152ac794 Handle *_EXTEND_VECTOR_INREG during Integer Legalization
Summary:
These nodes need legalization for 3-element vectors.  This commit
handles the legalization and adds tests for zext and sext.

This fixes PR30614.

Reviewers: RKSimon, srhines

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25268

llvm-svn: 283496
2016-10-06 21:27:05 +00:00
Michael Kuperstein 7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Matt Arsenault ef5bba0136 BranchRelaxation: Account for function alignment
llvm-svn: 283462
2016-10-06 16:00:58 +00:00
Matt Arsenault 36919a4f7c Move AArch64BranchRelaxation to generic code
llvm-svn: 283459
2016-10-06 15:38:53 +00:00
Reid Kleckner bb96df602e [codeview] Truncate records to maximum record size near 64KB
If we don't truncate, LLVM asserts when the label difference doesn't fit
in a 16 bit field. This patch truncates two kinds of data: trailing null
terminated names in symbol records, and inline line tables. The inline
line table test that I have is too large (many MB), so I'm not checking
it in.

Hopefully fixes PR28264.

llvm-svn: 283403
2016-10-05 22:36:07 +00:00
David Callahan c1051ab26e Modify df_iterator to support post-order actions
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes.  It will however allow a client to distinguish back from cross edges in a DFS tree.

Reviewers: nadav, mehdi_amini, dberlin

Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D25191

llvm-svn: 283391
2016-10-05 21:36:16 +00:00
Reid Kleckner 2b3e6428e5 [codeview] Translate bitpiece metadata to DEFRANGE_SUBFIELD* records
This allows LLVM to describe locations of aggregate variables that have
been split by SROA.

Fixes PR29141

Reviewers: amccarth, majnemer

Differential Revision: https://reviews.llvm.org/D25253

llvm-svn: 283388
2016-10-05 21:21:33 +00:00
Peter Collingbourne d799d28540 FastISel: Remove unused/un-overridden entry points. NFCI.
llvm-svn: 283366
2016-10-05 19:25:20 +00:00
Reid Kleckner f9dddec21c Improve DEBUG_VALUE assembly comments for spilled bitpieces
Previously we would give up when we saw the bitpiece DWARF expression
and print "[complex expression]" when actually we handled bitpiece
expressions outside the loop.

llvm-svn: 283355
2016-10-05 18:36:02 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Krzysztof Parzyszek e7c72cdbb0 Fix machine operand traversal in ScheduleDAGInstrs::fixupKills
llvm-svn: 283315
2016-10-05 13:15:06 +00:00
Mehdi Amini 149f6eaed9 Re-commit "Use StringRef in Support/Darf APIs (NFC)"
This reverts commit r283285 and re-commit r283275 with
a fix for format("%s", Str); where Str is a StringRef.

llvm-svn: 283298
2016-10-05 05:59:29 +00:00
Kyle Butt 25ac35d822 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Mehdi Amini 3e021be3b6 Use StringRef in FastISel API (NFC)
llvm-svn: 283291
2016-10-05 01:37:29 +00:00
Mehdi Amini 2bcac0fac4 Revert "Re-commit "Use StringRef in Support/Darf APIs (NFC)""
One test seems randomly broken: DebugInfo/X86/gnu-public-names.ll

llvm-svn: 283285
2016-10-05 01:04:02 +00:00
Mehdi Amini 32b297a42f Re-commit "Use StringRef in Support/Darf APIs (NFC)"
This reverts commit r283278 and re-commit r283275 with
the update to fix the build on the LLDB side.

llvm-svn: 283281
2016-10-05 00:37:18 +00:00
Mehdi Amini 78b04ae7ac Revert "Use StringRef in Support/Darf APIs (NFC)"
This reverts commit r283275, it broke LLDB Android debug server.

llvm-svn: 283278
2016-10-05 00:21:14 +00:00
Mehdi Amini e0327be584 Use StringRef in Support/Darf APIs (NFC)
llvm-svn: 283275
2016-10-04 23:55:40 +00:00
Kyle Butt adabac2d57 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
David L Kreitzer 7c7ee89b01 Revert r283248. It caused failures in the hexagon buildbots.
llvm-svn: 283254
2016-10-04 20:57:19 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
David L Kreitzer fedb9b67ca [safestack] Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D24896

llvm-svn: 283248
2016-10-04 20:31:32 +00:00
whitequark 7c4fe0e9a3 [SelectionDAG] Fix calling convention in expansion of ?MULO.
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.

The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.

Thanks to James Miller <james@aatch.net> for help in debugging.

Differential Revision: https://reviews.llvm.org/D25223

llvm-svn: 283203
2016-10-04 09:07:49 +00:00
Kyle Butt 3ffb8529bc Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Kyle Butt 396bfdd707 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00
Sanjay Patel f7df85af87 fix formatting; NFC
llvm-svn: 283115
2016-10-03 15:18:36 +00:00
Mehdi Amini 7410717a62 Use StringRef in Registry API (NFC)
llvm-svn: 283039
2016-10-01 15:44:54 +00:00
Mehdi Amini 36d33fc109 Use StringRef instead of raw pointers in MCAsmInfo/MCInstrInfo APIs (NFC)
llvm-svn: 283018
2016-10-01 06:46:33 +00:00
Mehdi Amini 48878ae579 Use StringRef in Datalayout API (NFC)
llvm-svn: 283013
2016-10-01 05:57:55 +00:00
Mehdi Amini 217b246484 Revert "Use StringRef in Datalayout API (NFC)"
This reverts commit r283009. Bots are broken.

llvm-svn: 283011
2016-10-01 05:12:48 +00:00
Mehdi Amini 29baf9c0e1 Use StringRef in Datalayout API (NFC)
llvm-svn: 283009
2016-10-01 04:17:59 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Eric Christopher e8d141c675 Remove getTargetTriple and update all uses to use the Triple off
of the TargetMachine. NFC.

llvm-svn: 283002
2016-10-01 01:50:33 +00:00
Eric Christopher 364dbe06d3 Stop calling getTargetTriple off of the AsmPrinter and constructing a
TargetTriple, just grab it off of the TargetMachine. NFC.

llvm-svn: 283001
2016-10-01 01:50:29 +00:00
Matthias Braun 298e007e99 ScheduleDAGInstrs: Cleanup, use range based for; NFC
llvm-svn: 282979
2016-09-30 23:08:07 +00:00
Reid Kleckner 9cb915b7be [SEH] Emit the parent frame offset label even if there are no funclets
This avoids errors about references to undefined local labels from
unreferenced filter functions.

Fixes (sort of) PR30431

llvm-svn: 282967
2016-09-30 22:10:12 +00:00
Dylan McKay 309eba75b1 Revert "[RegAllocGreedy] Attempt to split unspillable live intervals"
It was accidentally committed.

llvm-svn: 282855
2016-09-30 14:05:15 +00:00
Dylan McKay 2a80cc688a [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 282852
2016-09-30 13:59:20 +00:00
Adrian McCarthy d1185fc081 Clamp version number in S_COMPILE3 to avoid overflowing 16-bit field.
llvm-svn: 282761
2016-09-29 20:28:25 +00:00
Quentin Colombet 1b01677e61 [RegisterBankInfo] Change the default mapping for Copy and PHI.
Instead of producing a mapping for all the operands, we only generate a
mapping for the definition. Indeed, the other operands are not
constrained by the instruction and thus, we should leave the choice to
the actual definition to do the right thing.

In pratice this is almost NFC, but with one advantage. We will have only
one instance of OperandsMapping for each copy and phi that map to one
register bank instead of one different instance for each different
number of operands for each copy and phi.

llvm-svn: 282756
2016-09-29 19:51:46 +00:00
Reid Kleckner e45b2c7d8e [codeview] Use character types for all byte-sized integer types
The VS debugger doesn't appear to understand the 0x68 or 0x69 type
indices, which were probably intended for use on a platform where a C
'int' is 8 bits. So, use the character types instead. Clang was already
using the character types because '[u]int8_t' is usually defined in
terms of 'char'.

See the Rust issue for screenshots of what VS does:
https://github.com/rust-lang/rust/issues/36646

Fixes PR30552

llvm-svn: 282739
2016-09-29 17:55:01 +00:00
Matthias Braun aae7fe99d0 MachineFunction: Add missing newline in debug print()
Should not be a functional but an aesthetic change.

llvm-svn: 282669
2016-09-29 01:47:42 +00:00
Quentin Colombet 40cbc27ff3 [RegisterBankInfo] Uniquely generate OperandsMapping.
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.

llvm-svn: 282643
2016-09-28 22:20:49 +00:00
Quentin Colombet 97d2d21d65 [RegisterBankInfo] Rework the APIs of ValueMapping.
This is a preparatory commit for more TableGen-like structure.
NFC

llvm-svn: 282642
2016-09-28 22:20:24 +00:00
Adrian Prantl f8d10ceafe Remove dead code from LiveDebugVariables.cpp (NFC)
LiveDebugVariables doesn't propagate DBG_VALUEs accross basic block
boundaries any more; this functionality was split into LiveDebugValues.
We can thus drop the now dead references to LexicalScopes from LiveDebugVariables.

llvm-svn: 282638
2016-09-28 21:34:23 +00:00
Krzysztof Parzyszek dcb1bcae0b IfConversion: Add implicit uses for redefined regs with live subregisters
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.

llvm-svn: 282626
2016-09-28 20:07:41 +00:00
Adrian Prantl 7f5866c227 Teach LiveDebugValues about lexical scopes.
This addresses PR26055 LiveDebugValues is very slow.

Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.

For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:

void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
   f(i);
   if (i)
     f(i);
   f(i);
}

This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.

The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!

llvm-svn: 282611
2016-09-28 17:51:14 +00:00
Adrian Prantl 16b2ace0ab Rewrite loops to use range-based for. (NFC)
llvm-svn: 282608
2016-09-28 17:31:17 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Keith Walker 83ebef5db3 Propagate DBG_VALUE entries when there are unvisited predecessors
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.

This is addressed by only considering predecessor blocks which have
already been visited.

The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.

Differential Revision: https://reviews.llvm.org/D24927

llvm-svn: 282506
2016-09-27 16:46:07 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
James Molloy 9abb2fa5bb [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Quentin Colombet d816bfb282 [RegisterBankInfo] Constify the member of the XXXMapping maps.
This makes it obvious that items in those maps behave like statically
created objects.

llvm-svn: 282327
2016-09-24 04:54:03 +00:00
Quentin Colombet a50813f608 [RegisterBankInfo] Add statistics for dynamic value mappings.
Like partial mappings, as we move toward TableGen'ed information, the
number should reach zero eventually.

llvm-svn: 282325
2016-09-24 04:53:55 +00:00
Quentin Colombet fd8c95adf4 [RegisterBankInfo] Uniquely generate ValueMapping.
This is a step toward statically allocate ValueMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

llvm-svn: 282324
2016-09-24 04:53:52 +00:00
Quentin Colombet 8159de4de9 [RegisterBankInfo] Keep valid pointers for PartialMappings.
Previously we were using the address of the unique instance of a partial
mapping in the related map to access this instance. However, when the
map grows, the whole set of instances may be moved elsewhere and the
previous addresses are not valid anymore.

Instead, keep the address of the unique heap allocated instance of a
partial mapping.

Note: I did not see any actual bugs for that problem as the number of
partial mappings dynamically allocated is small (<= 4).

llvm-svn: 282323
2016-09-24 04:53:48 +00:00
Matthias Braun 729c989083 llc: Add -start-before/-stop-before options
Differential Revision: https://reviews.llvm.org/D23089

llvm-svn: 282302
2016-09-23 21:46:02 +00:00
Quentin Colombet 380cd88cfd [ResetMachineFunction] Populate the comments in the header of the file.
NFC

llvm-svn: 282276
2016-09-23 18:38:15 +00:00
Quentin Colombet 0f4c20a1e7 [ResetMachineFunction] Add statistic on the number of reset functions.
As the development of GlobalISel move forward, this statistic should
strictly decrease until it reaches zero. At this point, it would mean
GlobalISel can replace SDISel (at least on the tested inputs :P).

llvm-svn: 282275
2016-09-23 18:38:13 +00:00
Quentin Colombet d4c02437c1 [RegisterBankInfo] Add statistics for dynamic partial mappings.
Collect statistics about the number of partial mappings dynamically
allocated and accessed. Ultimately, when the whole TableGen
infrastructure is set, those numbers should be zero.

llvm-svn: 282274
2016-09-23 18:38:06 +00:00
Matthias Braun 1acb55e67c ScheduleDAG: Match enum names when printing sdep kinds
It is less confusing to have the same names in the debug print as the
enum members.

llvm-svn: 282273
2016-09-23 18:28:31 +00:00
Quentin Colombet c13ea88a14 [RegBankSelect] Use DEBUG_TYPE instead of repeating the name of the pass
NFC

llvm-svn: 282267
2016-09-23 17:50:06 +00:00
Quentin Colombet 09a9edcc26 [RegisterBank] Mark the dump method with LLVM_DUMP_METHOD.
NFC

llvm-svn: 282266
2016-09-23 17:50:03 +00:00
Quentin Colombet ceb71c2f43 [RegisterBankInfo] Mark the dump methods with LLVM_DUMP_METHOD.
NFC

llvm-svn: 282221
2016-09-23 00:59:12 +00:00
Quentin Colombet 1946580cf2 [RegisterBankInfo] Check that the mapping covers the interesting bits.
In the verify method of the ValueMapping class we used to check that the
mapping exactly matches the bits of the input value. This is problematic
for statically allocated mappings because we would need a different
mapping for each different size of the value that maps on one
instruction. For instance, with such scheme, we would need a different
mapping for a value of size 1, 5, 23 whereas they all end up on a 32-bit
wide instruction.

Therefore, change the verifier to check that the meaningful bits are
covered by the mapping instead of matching them.

llvm-svn: 282214
2016-09-23 00:14:34 +00:00
Quentin Colombet 0afa7d6b82 [RegisterBankInfo] Use array instead of SmallVector for BreakDown.
This is another step toward TableGen'ed like structures. The BreakDown of
the mapping of the value will be statically computed by TableGen, thus
we only have to point to the right entry in the table instead of
dynamically allocate the mapping for each instruction.

We still support the dynamic allocation through a factory of
PartialMapping to ease the bring-up of the targets while the TableGen
backend is not available.

llvm-svn: 282213
2016-09-23 00:14:30 +00:00
Matthias Braun 5f8492e2ce MachineScheduler: Slightly simplify release node
llvm-svn: 282201
2016-09-22 21:39:56 +00:00
Matthias Braun 46533e614b MachineScheduler: Remove ineffective heuristic; NFC
Currently all nodes get added to the NextSU list when they are released,
so any candidate must be in that list, making the heuristic ineffective.
Remove it for now, we can add it back later in a working fashion if
necessary.

llvm-svn: 282200
2016-09-22 21:39:52 +00:00
Hans Wennborg c4b1d20ba2 Win64: Don't emit unwind info for "leaf" functions (PR30337)
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.

This patch makes LLVM not emit unwind info for such functions, to save
binary size.

Differential Revision: https://reviews.llvm.org/D24748

llvm-svn: 282185
2016-09-22 19:50:05 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Tim Northover a5e38fa00d GlobalISel: handle stack-based parameters on AArch64.
llvm-svn: 282153
2016-09-22 13:49:25 +00:00
Quentin Colombet 6a76323c64 [RegisterBankInfo] Move to statically allocated RegisterBank.
This commit is basically the first step toward what will
RegisterBankInfo look when it gets TableGen'ed.

It introduces a XXXGenRegisterBankInfo.def file that is what TableGen
will issue at some point. Moreover, the RegBanks field in
RegisterBankInfo changed to reflect the static (compile time) aspect of
the information.

llvm-svn: 282131
2016-09-22 02:10:37 +00:00
Quentin Colombet b202eb88aa [RegisterBankInfo] Take advantage of the extra argument of SmallVector::resize.
When initializing an instance of OperandsMapper, instead of using
SmallVector::resize followed by std::fill, use the function that
directly does that in SmallVector.

llvm-svn: 282130
2016-09-22 02:10:32 +00:00
Davide Italiano 0f8b5d65f6 [MIRParser] Delete dead code. NFCI.
llvm-svn: 282098
2016-09-21 18:26:08 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Tim Northover 9a46718378 GlobalISel: produce correct code for signext/zeroext ABI flags.
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.

llvm-svn: 282069
2016-09-21 12:57:45 +00:00
Tim Northover 862758ec14 GlobalISel: pass Function to lowerFormalArguments directly (NFC).
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).

llvm-svn: 282068
2016-09-21 12:57:35 +00:00
Eric Christopher c4636b3002 Revert "Remove extra argument used once on
TargetMachine::getNameWithPrefix and inline the result into the singular
caller." and "Remove more guts of TargetMachine::getNameWithPrefix and
migrate one check to the TLOF mach-o version." temporarily until I can
get the whole call migrated out of the TargetMachine as we could hit
places where TLOF isn't valid.

This reverts commits r281981 and r281983.

llvm-svn: 282028
2016-09-20 22:03:28 +00:00
George Burgess IV b44b6d98b3 [CodeGen] stop short-circuiting the SSP code for sspstrong.
This check caused us to skip adding layout information for calls to
alloca in sspreq/sspstrong mode. We check properly for sspstrong later
on (and add the correct layout info when doing so), so removing this
shouldn't hurt.

No test is included, since testing this using lit seems to require
checking for exact offsets in asm, which is something that the lit tests
for this avoid. If someone cares deeply, I'm happy to write a unittest
or something to cover this, but that feels like overkill.

Patch by Daniel Micay.

Differential Revision: https://reviews.llvm.org/D22714

llvm-svn: 282022
2016-09-20 21:30:01 +00:00
Petr Hosek 1290355f5a Mark ELF sections whose name start with .note as note
Previously, such section would be marked as SHT_PROGBITS which
makes it impossible to use an initialized C variable declaration
to emit an (allocated) ELF note. The new behavior is also consistent
with ELF assembly parser.

Differential Revision: https://reviews.llvm.org/D24692

llvm-svn: 282010
2016-09-20 20:21:13 +00:00
Adrian McCarthy ad8ac54d10 Fix syntactical nit from r281990.
llvm-svn: 281991
2016-09-20 17:42:13 +00:00
Adrian McCarthy c64acfd4c2 Emit S_COMPILE3 CodeView record
CodeView has an S_COMPILE3 record to identify the compiler and source language of the compiland.  This record comes first in the debug$S section for the compiland. The debuggers rely on this record to know the source language of the code.

There was a little test fallout from introducing a new record into the symbols subsection.

Differential Revision: https://reviews.llvm.org/D24317

llvm-svn: 281990
2016-09-20 17:20:51 +00:00
Eric Christopher a1ccdc3433 Remove more guts of TargetMachine::getNameWithPrefix and migrate one check to the TLOF mach-o version.
NFC intended.

llvm-svn: 281983
2016-09-20 16:05:02 +00:00
Eric Christopher 0be7793d75 Remove extra argument used once on TargetMachine::getNameWithPrefix and inline the result into the singular caller.
llvm-svn: 281981
2016-09-20 16:04:50 +00:00
Keith Walker f83a19ffc2 Improve the -debug output for Debug Range Extension (NFC)
Include header messages and remove unnecessary blank lines.

llvm-svn: 281980
2016-09-20 16:04:31 +00:00
Tim Northover b18ea162df GlobalISel: split aggregates for PCS lowering
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.

For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.

llvm-svn: 281977
2016-09-20 15:20:36 +00:00
Matthias Braun 4040c0f4ec BranchFolder: Fix invalid undef flags after merge.
It is legal to merge instructions with different undef flags; However we
must drop the undef flag from the merged instruction if it isn't present
everywhere.

This fixes http://llvm.org/PR30199

llvm-svn: 281957
2016-09-20 01:14:42 +00:00
Quentin Colombet ae273b6ba2 [RegisterBankInfo] Adapt call to std::fill due to use of SmallVector.
This was meant to be commited with my previous commit.

llvm-svn: 281948
2016-09-19 23:18:47 +00:00
Quentin Colombet a7330ba5ac [RegisterBankInfo] Avoid heap allocation in most cases.
The OperandsMapper class is used heavy in RegBankSelect and each
instantiation triggered a heap allocation for the array of operands.
Instead, use a SmallVector with a big enough size such that most of the
cases do not have to use dynamically allocated memory.

This improves the compile time of the RegBankSelect pass.

llvm-svn: 281916
2016-09-19 17:33:55 +00:00
Matthias Braun e1d9628bba LiveRangeCalc: Fix reporting of invalid vreg usage in liveness calculation
Machine programs need a definition of each vreg before reaching a use
(the definition may come from an IMPLICIT_DEF instruction). This class
of errors is not detected by the MachineVerifier because of efficiency
concerns. LiveRangeCalc used to report these problems, make it do that
again (followup to r279625).

Also use report_fatal_error() instead of llvm_unreachable() as the error
reporting is only present in asserts build anyway.

llvm-svn: 281914
2016-09-19 16:49:45 +00:00
Dean Michael Berris 4640154446 [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 281878
2016-09-19 00:54:35 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Mehdi Amini a53d49e1b5 Don't create a SymbolTable in Function when the LLVMContext discards value names (NFC)
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.

This is a recommit of r281806 after fixing the accessor to return
a pointer instead of a reference and updating all the call-sites.

llvm-svn: 281813
2016-09-17 06:00:02 +00:00
Mehdi Amini 05188a646d [MIR Parser] Fix Build!
Last-second refactoring before push was bad idea...

llvm-svn: 281812
2016-09-17 05:41:02 +00:00
Mehdi Amini 8d904e9534 MIR Parser: issue an error when the Context discard value names.
This is in line with the LLParser behavior

llvm-svn: 281811
2016-09-17 05:33:58 +00:00
Evgeniy Stepanov aa84f050fc [safestack] Fix assertion failure in stack coloring.
This is a fix for PR30318.

Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.

llvm-svn: 281784
2016-09-16 22:04:10 +00:00
Quentin Colombet 318582f9f9 [RegAllocGreedy] Fix the list of NewVRegs for last chance recoloring.
When trying to recolor a register we may split live-ranges in the
process. When we create new live-ranges we will have to process them,
but when we move a register from Assign to Split, the allocation is not
changed until the whole recoloring session is successful.
Therefore, only push the live-ranges that changed from Assign to
Split when the recoloring is successful.

Same as the previous commit, I was not able to produce a test case that
reproduce the problem with in-tree targets.

Note: The bug has been here since the recoloring scheme has been added
back in r200883 (Feb 2014).

llvm-svn: 281783
2016-09-16 22:00:50 +00:00
Quentin Colombet 631768659d [RegAllocGreedy] Fix an assertion and condition when last chance recoloring is used.
When last chance recoloring is used, the list of NewVRegs may not be
empty when calling selectOrSplitImpl. Indeed, another coloring may have
taken place with splitting/spilling in the same recoloring session.

Relax an assertion to take this into account and adapt a condition to
act as if the NewVRegs were local to this selectOrSplitImpl instance.

Unfortunately I am unable to produce a test case for this, I was only
able to reproduce the conditions on an out-of-tree target.

llvm-svn: 281782
2016-09-16 22:00:42 +00:00
Ahmed Bougacha 74db8faa71 [AArch64][GlobalISel] Test default regbank mapping for G_ICMP.
Also relax a RegisterBankInfo verifier check that's incompatible with
1-bit mappings.

llvm-svn: 281735
2016-09-16 14:44:54 +00:00
Keith Walker 830a8c1fbd Place the lowered phi instruction(s) before the DEBUG_VALUE entry
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.

Differential Revision: https://reviews.llvm.org/D23760 

llvm-svn: 281727
2016-09-16 14:07:29 +00:00
Eric Christopher 4367c7fb9a Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Tim Northover 22d82cf179 GlobalISel: legalize GEP instructions with small offsets.
llvm-svn: 281602
2016-09-15 11:02:19 +00:00
Tim Northover 4cf0a482bc GlobalISel: relax type constraints on G_ICMP to allow pointers.
llvm-svn: 281600
2016-09-15 10:40:38 +00:00
Tim Northover 32a078ad1a GlobalISel: remove "unsized" LLT
It was only really there as a sentinel when instructions had to have precisely
one type. Now that registers are typed, each register really has to have a type
that is sized.

llvm-svn: 281599
2016-09-15 10:09:59 +00:00
Tim Northover 5ae8350af6 GlobalISel: cache pointer sizes in LLT
Otherwise everything that needs to work out what size they are has to keep a
DataLayout handy, which is a bit silly and very annoying.

llvm-svn: 281597
2016-09-15 09:20:34 +00:00
Reid Kleckner 4ad127f5fc Fix indentation in codeview code
llvm-svn: 281542
2016-09-14 21:49:21 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Sanjoy Das 23f06e53d8 [Stackmap] Added callsite counts to emitted function information.
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.

**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.

Patch by Kavon Farvardin!

Reviewers: atrick, ributzka, sanjoy

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D23487

llvm-svn: 281532
2016-09-14 20:22:03 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Sanjay Patel 284582b6d4 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Silviu Baranga 0a020f0fb0 [StackProtector] Use INITIALIZE_TM_PASS instead of INITIALIZE_PASS
in order to make sure that its TargetMachine constructor is
registered.

This allows us to run the PEI machine pass with MIR input
(see PR30324).

llvm-svn: 281474
2016-09-14 14:09:43 +00:00
Pawel Bylica c397f0b272 [CodeGen] Fix invalid shift in mul expansion
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478

llvm-svn: 281403
2016-09-13 21:55:41 +00:00
Michael Kuperstein 59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Simon Pilgrim 4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Simon Pilgrim bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Ayman Musa 0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Michael Kuperstein efc0667583 [DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI.
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.

llvm-svn: 281283
2016-09-13 00:57:43 +00:00
Dehao Chen c32d71253c Fix the bug introduced in r281252.
llvm-svn: 281253
2016-09-12 20:29:54 +00:00
Dehao Chen 9bbb941acf Lower consecutive select instructions correctly.
Summary: If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095

Reviewers: davidxl

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D24147

llvm-svn: 281252
2016-09-12 20:23:28 +00:00
Ahmed Bougacha 925961b20c [GlobalISel] Fix mismatched "<..)" in intrinsic MO printing. NFC.
llvm-svn: 281229
2016-09-12 16:21:49 +00:00
Ahmed Bougacha b678219aa6 [BranchFolding] Unique added live-ins after hoisting code.
We're not supposed to have duplicate live-ins.

llvm-svn: 281224
2016-09-12 16:05:31 +00:00
Tim Northover 032548fc5e GlobalISel: support translation of global addresses.
llvm-svn: 281207
2016-09-12 12:10:41 +00:00
Tim Northover a7653b3919 GlobalISel: translate GEP instructions.
Unlike SDag, we use a separate G_GEP instruction (much simplified, only taking
a single byte offset) to preserve the pointer type information through
selection.

llvm-svn: 281205
2016-09-12 11:20:22 +00:00
Tim Northover d28d3cc079 GlobalISel: disambiguate types when printing MIR
Some generic instructions have multiple types. While in theory these always be
discovered by inspecting the single definition of each generic vreg, in
practice those definitions won't always be local and traipsing through a big
function to find them will not be fun.

So this changes MIRPrinter to print out the type of uses as well as defs, if
they're known to be different or not known to be the same.

On the parsing side, we're a little more flexible: provided each register is
given a type in at least one place it's mentioned (and all types are
consistent) we accept the MIR. This doesn't introduce ambiguity but makes
writing tests manually a bit less painful.

llvm-svn: 281204
2016-09-12 11:20:10 +00:00
Craig Topper 7600794dde [TwoAddressInstruction] When commuting an instruction don't assume that the destination register is operand 0. Pass it from the caller.
In practice it probably is 0 so this may not be a functional change.

llvm-svn: 281180
2016-09-11 22:10:42 +00:00
Duncan P. N. Exon Smith 1872096f1e CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator.  This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message).  MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.

r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation).  A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.

Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032.  The
following table might help:

                  [Old]              ==>             [New]
        delete &*RI, RE = end()                   delete &*RI++
        RI->erase(), RE = end()                   RI++->erase()
      reverse_iterator(I)                 std::prev(I).getReverse()
      reverse_iterator(I)                          ++I.getReverse()
    --reverse_iterator(I)                            I.getReverse()
      reverse_iterator(std::next(I))                 I.getReverse()
                RI.base()                std::prev(RI).getReverse()
                RI.base()                         ++RI.getReverse()
              --RI.base()                           RI.getReverse()
     std::next(RI).base()                           RI.getReverse()

(For more details, have a look at r280032.)

llvm-svn: 281172
2016-09-11 18:51:28 +00:00
Duncan P. N. Exon Smith cc9edace0c CodeGen: Turn on sentinel tracking for MachineInstr iterators
This is a prep commit before fixing MachineBasicBlock::reverse_iterator
invalidation semantics, ala r281167 for ilist::reverse_iterator.  This
changes MachineBasicBlock::Instructions to track which node is the
sentinel regardless of LLVM_ENABLE_ABI_BREAKING_CHECKS.

There's almost no functionality change (aside from ABI).  However, in
the rare configuration:

    #if !defined(NDEBUG) && !defined(LLVM_ENABLE_ABI_BREAKING_CHECKS)

the isKnownSentinel() assertions in ilist_iterator<>::operator* suddenly
have teeth for MachineInstr.  If these assertions start firing for your
out-of-tree backend, have a look at the suggestions in the commit
message for r279314, and at some of the commits leading up to it that
avoid dereferencing the end() iterator.

llvm-svn: 281168
2016-09-11 16:38:18 +00:00
Craig Topper 1f81deee1f [CodeGen] Make the TwoAddressInstructionPass check if the instruction is commutable before calling findCommutedOpIndices for every operand. Also make sure the operand is a register before each call to save some work on commutable instructions that might have an operand.
llvm-svn: 281158
2016-09-11 06:00:15 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Justin Lebar d98cf00c95 [CodeGen] Rename MachineInstr::isInvariantLoad to isDereferenceableInvariantLoad. NFC
Summary:
I want to separate out the notions of invariance and dereferenceability
at the MI level, so that they correspond to the equivalent concepts at
the IR level.  (Currently an MI load is MI-invariant iff it's
IR-invariant and IR-dereferenceable.)

First step is renaming this function.

Reviewers: chandlerc

Subscribers: MatzeB, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D23370

llvm-svn: 281125
2016-09-10 01:03:20 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Saleem Abdulrasool 92e33a3ebc ARM: move the builtins libcall CC setup
Move the target specific setup into the target specific lowering setup.  As
pointed out by Anton, the initial change was moving this too high up the stack
resulting in a violation of the layering (the target generic code path setup
target specific bits).  Sink this into the ARM specific setup.  NFC.

llvm-svn: 281088
2016-09-09 20:11:31 +00:00
Rui Ueyama a5edf655af Fix another -Wunused-variable for non-assert build.
llvm-svn: 281073
2016-09-09 18:37:08 +00:00
Rui Ueyama 47320da1ee Fix -Wunused-variable for non-assert build.
llvm-svn: 281069
2016-09-09 18:07:33 +00:00
Zachary Turner c6d54da891 [pdb] Write PDB TPI Stream from Yaml.
This writes the full sequence of type records described in
Yaml to the TPI stream of the PDB file.

Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24316

llvm-svn: 281063
2016-09-09 17:46:17 +00:00
Reid Kleckner 1076288e22 [codeview] Don't assert if the array element type is incomplete
This can happen when the frontend knows the debug info will be emitted
somewhere else. Usually this happens for dynamic classes with out of
line constructors or key functions, but it can also happen when modules
are enabled.

llvm-svn: 281060
2016-09-09 17:29:36 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
Tim Northover 25d1286e5a GlobalISel: remove G_TYPE and G_PHI
These instructions were only necessary when type information was stored in the
MachineInstr (because only generic MachineInstrs possessed a type). Now that
it's in MachineRegisterInfo, COPY and PHI work fine.

llvm-svn: 281037
2016-09-09 11:47:31 +00:00
Tim Northover 1f8b1db93e GlobalISel: fix comments and add assertions for valid instructions.
llvm-svn: 281036
2016-09-09 11:46:58 +00:00
Tim Northover 0f140c769a GlobalISel: move type information to MachineRegisterInfo.
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.

Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.

As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).

llvm-svn: 281035
2016-09-09 11:46:34 +00:00
Zachary Turner 35377f88f5 [YAMLIO] Add the ability to map with context.
mapping a yaml field to an object in code has always been
a stateless operation.  You could still pass state by using the
`setContext` function of the YAMLIO object, but this represented
global state for the entire yaml input.  In order to have
context-sensitive state, it is necessary to pass this state in
at the granularity of an individual mapping.

This patch adds support for this type of context-sensitive state.
You simply pass an additional argument of type T to the
`mapRequired` or `mapOptional` functions, and provided you have
specialized a `MappingContextTraits<U, T>` class with the
appropriate mapping function, you can pass this context into
the mapping function.

Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D24162

llvm-svn: 280977
2016-09-08 18:22:44 +00:00
Renato Golin 049f387112 Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"
And associated commits, as they broke the Thumb bots.

This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.

llvm-svn: 280967
2016-09-08 17:10:39 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Michael Kuperstein f79af6f8c4 [CGP] Be less conservative about tail-duplicating a ret to allow tail calls
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.

Instead, we can use the same logic the builder uses to determine whether the
attributes work out.

Differential Revision: https://reviews.llvm.org/D24315

llvm-svn: 280894
2016-09-08 00:48:37 +00:00
Dean Michael Berris cf3801eee8 [XRay] Remove unused variable
llvm-svn: 280891
2016-09-08 00:38:22 +00:00
Dean Michael Berris 17d94e279e [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 280888
2016-09-08 00:19:04 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00
Michael Kuperstein 71321563de Don't reuse a variable name in a nested scope. NFC.
llvm-svn: 280853
2016-09-07 20:29:49 +00:00
Saleem Abdulrasool 02d9851c1c CodeGen: ensure that libcalls are always AAPCS CC
The original commit was too aggressive about marking LibCalls as AAPCS.  The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.

llvm-svn: 280833
2016-09-07 17:56:09 +00:00
Hans Wennborg 75e25f6812 X86: Fold tail calls into conditional branches where possible (PR26302)
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.

Example:

  define void @f(i32 %x, i32 %y) {
  entry:
    %p = icmp eq i32 %x, %y
    br i1 %p, label %bb1, label %bb2
  bb1:
    tail call void @foo()
    ret void
  bb2:
    tail call void @bar()
    ret void
  }

before:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     .LBB0_2
          jmp     foo
  .LBB0_2:
          jmp     bar

after:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     bar
  .LBB0_1:
          jmp     foo

I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.

This patch only does 32-bit, but 64-bit would work similarly.

Differential Revision: https://reviews.llvm.org/D24108

llvm-svn: 280832
2016-09-07 17:52:14 +00:00
Reid Kleckner a9f4cc9510 [codeview] Add new directives to record inlined call site line info
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.

Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.

Also fixes issues with discontiguous inlined source locations, like in
this example:

  volatile int unlikely_cond = 0;
  extern void __declspec(noreturn) abort();
  __forceinline void f() {
    if (!unlikely_cond) abort();
  }
  int main() {
    unlikely_cond = 0;
    f();
    unlikely_cond = 0;
  }

Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.

Reviewers: majnemer, amccarth

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24014

llvm-svn: 280822
2016-09-07 16:15:31 +00:00
Matt Arsenault 6cda10c950 Remove unnecessary call to getAllocatableRegClass
This reapplies r252565 and r252674, effectively reverting r252956.

This allows VS_32/VS_64 to be unallocatable like they should be.

llvm-svn: 280783
2016-09-07 06:16:45 +00:00
Saleem Abdulrasool a7ade33d16 Revert "CodeGen: ensure that libcalls are always AAPCS CC"
This reverts SVN r280683.  Revert until I figure out why this is breaking lli
tests.

llvm-svn: 280778
2016-09-07 03:17:19 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Silviu Baranga 0b7c4af359 [RegisterScavenger] Remove aliasing registers of operands from the candidate set
Summary:
In addition to not including the register operand of the current
instruction also don't include any aliasing registers. We can't consider
these as candidates because using them will clobber the corresponding
register operand of the current instruction.

This change doesn't include a test case and it would probably be difficult
to produce a stable one since the bug depends on the results of register
allocation.

Reviewers: MatzeB, qcolombet, hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D24130

llvm-svn: 280698
2016-09-06 10:10:21 +00:00
Saleem Abdulrasool a6519b1d54 CodeGen: ensure that libcalls are always AAPCS CC
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683
2016-09-06 00:28:43 +00:00
Xinliang David Li 241e6c7086 [Profile] preserve branch metadata lowering select in CGP
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600
2016-09-03 21:26:36 +00:00
Matt Arsenault f3d1a1a1b6 Improve debug error message with register name
llvm-svn: 280583
2016-09-03 06:57:49 +00:00
Duncan P. N. Exon Smith 8cc24eadd2 ADT: Remove external uses of ilist_iterator, NFC
Delete the dead code for Write(ilist_iterator) in the IR Verifier,
inline report(ilist_iterator) at its call sites in the MachineVerifier,
and use simple_ilist<>::iterator in SymbolTableListTraits.

The only remaining reference to ilist_iterator outside of the ilist
implementation is from MachineInstrBundleIterator.  I'll get rid of that
in a follow-up.

llvm-svn: 280565
2016-09-03 01:22:56 +00:00
Krzysztof Parzyszek 3bf4aeccd6 Do not consider subreg defs as reads when computing subrange liveness
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.

Differential Revision: https://reviews.llvm.org/D24190

llvm-svn: 280532
2016-09-02 19:48:55 +00:00
Kyle Butt e31cc84290 IfConversion: Add assertions that both sides of a diamond don't pred-clobber.
One side of a diamond may end with a predicate clobbering instruction.
That side of the diamond has to be if-converted second. Both sides can't
clobber the predicate or the ifconversion is invalid. This is checked
elsewhere, but add an assert as a safety check. NFC

llvm-svn: 280518
2016-09-02 18:29:28 +00:00
Kyle Butt 8699921c4b IfConversion: Fix bug introduced by rescanning diamonds.
Passing the wrong values for predicate-clobbering. Simple to miss.
Added an assert to make this easier to catch in the future.

llvm-svn: 280517
2016-09-02 18:29:26 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Kyle Butt 93e94e8a12 IfConversion: Don't count branches in # of duplicates.
If the entire blocks match, we would count the branch instructions
toward the number of duplicated instructions. This doesn't match what we
do elsewhere, and was causing a bug.

llvm-svn: 280448
2016-09-02 01:20:06 +00:00
Aditya Kumar 356f79d535 [SelectionDAGBuilder] Add const to relevant places
Reviewers: hans, evandro, sebpop

Differential Revision: https://reviews.llvm.org/D24112

llvm-svn: 280430
2016-09-01 23:35:26 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Tim Northover 8d8812c5d7 GlobalISel: add a G_PHI instruction to give phis a type.
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.

llvm-svn: 280412
2016-09-01 20:45:41 +00:00
Michael Kuperstein b4743597bd Rename some variables to have meaningful names. NFC.
llvm-svn: 280391
2016-09-01 18:24:42 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Krzysztof Parzyszek 4f863d75f4 Add an optional parameter with a list of undefs to extendToIndices
Reapply r280268, hopefully in a version that MSVC likes.

llvm-svn: 280358
2016-09-01 12:10:36 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Hal Finkel 40d7f5c277 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

llvm-svn: 280347
2016-09-01 09:42:39 +00:00
Dean Michael Berris e8ae5baaf7 [XRay] Detect and emit sleds for sibling/tail calls
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.

The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.

A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.

Reviewers: echristo, rSerge, majnemer

Subscribers: mehdi_amini, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D23986

llvm-svn: 280334
2016-09-01 01:29:13 +00:00
Reid Kleckner 109448ee81 Revert "Add an optional parameter with a list of undefs to extendToIndices"
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.

Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".

llvm-svn: 280301
2016-08-31 22:36:02 +00:00
Tim Northover 11a2354670 GlobalISel: use G_TYPE to annotate physregs with a type.
More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.

This is done via a new G_TYPE instruction.

llvm-svn: 280292
2016-08-31 21:24:02 +00:00
Quentin Colombet 1c06a73a7c [TargetPassConfig] Add a hook to tell whether GlobalISel should warm on fallback.
Thanks to this patch, we know have a way to easly see if GlobalISel
failed.

llvm-svn: 280273
2016-08-31 18:43:04 +00:00
Quentin Colombet 612cd1fdd6 [ResetMachineFunction] Emit the diagnostic isel fallback when asked.
This pass is now able to report when the function is being reset.

llvm-svn: 280272
2016-08-31 18:43:01 +00:00
Krzysztof Parzyszek 576225daf5 Add an optional parameter with a list of undefs to extendToIndices
llvm-svn: 280268
2016-08-31 18:02:19 +00:00
Tim Shen 48f814e8a3 s/static inline/static/ for headers I have changed in r279475. NFC.
llvm-svn: 280257
2016-08-31 16:48:13 +00:00
Reid Kleckner 9dac47319d [codeview] Emit vtable shape information
The shape of the vtable is passed down as the size of the
__vtbl_ptr_type. This special pointer type appears both as the pointee
type of the vptr type, and by itself in every dynamic class. For classes
with multiple vtables, only the shape of the primary vftable is
included, as the shape of all secondary vftables will be the same as in
the base class.

Fixes PR28150

llvm-svn: 280254
2016-08-31 15:59:30 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Krzysztof Parzyszek 64488118bf Fixed spill stack objects are mutable
Differential Revision: https://reviews.llvm.org/D24039

llvm-svn: 280244
2016-08-31 13:52:17 +00:00
Dean Michael Berris 047669f18c [XRay] Support multiple return instructions in a single basic block
Add a .mir test to catch this case, and fix the xray-instrumentation
pass to handle it appropriately.

llvm-svn: 280192
2016-08-31 05:20:08 +00:00
Reid Kleckner dbaa61cbe3 [codeview] Remove redundant TypeTable lookup
As written, the code should assert if this lookup would have ever
succeeded.  Without looking through composite types, the type graph
should be acyclic.

llvm-svn: 280168
2016-08-30 21:48:14 +00:00
Tim Northover 991b12bf09 GlobalISel: combine extracts & sequences created for legalization
Legalization ends up creating many G_SEQUENCE/G_EXTRACT pairs which leads to
inefficient codegen (even for -O0), so add a quick pass over the function to
remove them again.

llvm-svn: 280155
2016-08-30 20:51:25 +00:00
Duncan P. N. Exon Smith 73b8dbdd94 CodeGen: Fixup for r280128, since GCC isn't as permissive as Clang
Fixes the bots, e.g.:
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/10055

llvm-svn: 280135
2016-08-30 19:11:11 +00:00
Tim Northover e5102de678 GlobalISel: forbid physical registers on generic MIs.
We're intending to move to a world where the type of a register is determined
by its (unique) def. This is incompatible with physregs, which are untyped.

It also means the other passes don't have to worry quite so much about
register-class compatibility and inserting COPYs appropriately.

llvm-svn: 280132
2016-08-30 18:52:46 +00:00
Duncan P. N. Exon Smith f947c3afe1 ADT: Split ilist_node_traits into alloc and callback, NFC
Many lists want to override only allocation semantics, or callbacks for
iplist.  Split these up to prevent code duplication.
- Specialize ilist_alloc_traits to change the implementations of
  deleteNode() and createNode().
- One common desire is to do nothing deleteNode() and disable
  createNode().  Specialize ilist_alloc_traits to inherit from
  ilist_noalloc_traits for that behaviour.
- Specialize ilist_callback_traits to use the addNodeToList(),
  removeNodeFromList(), and transferNodesFromList() callbacks.

As a drive-by, add some coverage to the callback-related unit tests.

llvm-svn: 280128
2016-08-30 18:40:47 +00:00
Kyle Butt 61aca6ef79 TailDuplication: Extract Indirect-Branch block limit as option. NFC
The existing code hard-coded a limit of 20 instructions for duplication
when a block ended with an indirect branch. Extract this as an option.
No functional change intended.

llvm-svn: 280125
2016-08-30 18:18:54 +00:00
Duncan P. N. Exon Smith b7668d5164 ADT: Guarantee transferNodesFromList is only called on transfers
Guarantee that ilist_traits<T>::transferNodesFromList is only called
when nodes are actually changing lists.

I also moved all the callbacks to occur *first*, before the operation.
This is the only choice for iplist<T>::merge, so we might as well be
consistent.  I expect this to have no effect in practice, although it
simplifies the logic in both iplist<T>::transfer and iplist<T>::insert.

llvm-svn: 280122
2016-08-30 18:00:45 +00:00
Sanjoy Das 6d3c9132e3 Fix coding style; NFC
Avoid variables starting with lowercase.

llvm-svn: 280048
2016-08-30 01:38:59 +00:00
Duncan P. N. Exon Smith 5c001c367f ADT: Give ilist<T>::reverse_iterator a handle to the current node
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator.  Make it so.

In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted.  This matches the
guarantees of ilist<T>::iterator.

(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>.  This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator.  See note at end of commit
message for details on bundle iterators.)

Given the list (with the Sentinel showing twice for simplicity):

     [Sentinel] <-> A <-> B <-> [Sentinel]

the following is now true:
 1. begin() represents A.
 2. begin() holds the pointer for A.
 3. end() represents [Sentinel].
 4. end() holds the poitner for [Sentinel].
 5. rbegin() represents B.
 6. rbegin() holds the pointer for B.
 7. rend() represents [Sentinel].
 8. rend() holds the pointer for [Sentinel].

The changes are #6 and #8.  Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
  for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
  involved a confusing increment; and
- "RI++->erase()" left RI invalid.  The unintuitive replacement was
  "RI->erase(), RE = end()".

With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase).  But with lists, this was a poor design.

Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:

    for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
      fooThatMightRemoveArgFromList(*RI++);

Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.

    reverse_iterator iterator::getReverse();
    iterator reverse_iterator::getReverse();

Why doesn't iterator <=> reverse_iterator conversion use constructors?

In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator.  It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).

Old code used this API:

    std::reverse_iterator::reverse_iterator(iterator);
    iterator std::reverse_iterator::base();

Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:

            [Old]         ==>          [New]
    reverse_iterator(I)       (--I).getReverse()
    reverse_iterator(I)         ++I.getReverse()
  --reverse_iterator(I)           I.getReverse()
    reverse_iterator(++I)         I.getReverse()
          RI.base()          (--RI).getReverse()
          RI.base()            ++RI.getReverse()
        --RI.base()              RI.getReverse()
      (++RI).base()              RI.getReverse()
  delete &*RI, RE = end()         delete &*RI++
  RI->erase(), RE = end()         RI++->erase()

=======================================
Note: bundle iterators are out of scope
=======================================

MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles.  The idea is that each operator++ takes you to the
beginning of the next bundle.  Implementing a sane reverse iterator for
this is harder than ilist.  Here are the options:
- Use std::reverse_iterator<MBB::i>.  Store a handle to the beginning of
  the next bundle.  A call to operator*() runs a loop (usually
  operator--() will be called 1 time, for unbundled instructions).
  Increment/decrement just works.  This is the status quo.
- Store a handle to the final node in the bundle.  A call to operator*()
  still runs a loop, but it iterates one time fewer (usually
  operator--() will be called 0 times, for unbundled instructions).
  Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
  sentinel (instead of just in asserts mode).  Then the bundle iterator
  can sniff the sentinel bit in operator++().

I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone.  I have a WIP series of patches that implements the final
option.

llvm-svn: 280032
2016-08-30 00:13:12 +00:00
Tim Northover f8bab1ce0c GlobalISel: use multi-dimensional arrays for legalize actions.
Instead of putting all possible requests into a single table, we can perform
the extremely dense lookup based on opcode and type-index in constant time
using multi-dimensional array-like things.

This roughly halves the time spent doing legalization, which was dominated by
queries against the Actions table.

llvm-svn: 280011
2016-08-29 21:00:00 +00:00
Krzysztof Parzyszek 354832e585 Propagate TBAA info in SelectionDAG::getIndexedLoad
Patch by Pranav Bhandarkar.

llvm-svn: 279998
2016-08-29 19:50:15 +00:00
Tim Northover ac5148ef41 GlobalISel: switch to SmallVector for pending legalizations.
std::queue was doing far to many heap allocations to be healthy.

llvm-svn: 279992
2016-08-29 19:27:20 +00:00
Tim Northover edb3c8ccb8 GlobalISel: legalize frem to a libcall on AArch64.
llvm-svn: 279988
2016-08-29 19:07:16 +00:00
Tim Northover fe5f89ba14 GlobalISel: rework CallLowering so that it can be used for libcalls too.
There should be no functional change here, I'm just making the implementation
of "frem" (to libcall) legalization easier for a followup.

llvm-svn: 279987
2016-08-29 19:07:08 +00:00
Kyle Butt 092c4dd5b6 IfConversion: Fix branch predication bug.
This bug shows up with diamonds that share unpredicable, unanalyzable branches.
There's an included test case from Hexagon. What was happening was that we were
attempting to predicate the branch instruction despite the fact that it was
checked to be the same. Now for unanalyzable branches we skip over the branch
instructions when predicating the block.

Differential Revision: https://reviews.llvm.org/D23939

llvm-svn: 279985
2016-08-29 18:27:12 +00:00
Sanjay Patel b57d0a2fda [TargetLowering] remove fdiv and frem from canOpTrap() (PR29114)
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of 
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a 
similar bug in Constant::canTrap().

This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> 
type.

Differential Revision: https://reviews.llvm.org/D23974

llvm-svn: 279970
2016-08-29 13:32:41 +00:00
Krzysztof Parzyszek 0a955d6dcb Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole register
MRI::getMaxLaneMaskForVReg does not always cover the whole register.
For example, on X86 the upper 16 bits of EAX cannot be accessed via
any subregister. Consequently, there is no lane mask that only covers
that part of EAX. The getMaxLaneMaskForVReg will return the union of
the lane masks for all subregisters, and in case of EAX, that union
will not cover the upper 16 bits.

This fixes https://llvm.org/bugs/show_bug.cgi?id=29132

llvm-svn: 279969
2016-08-29 13:15:35 +00:00
Rafael Espindola 412a529551 Use the correct ctor/dtor section for dynamic-no-pic.
llvm-svn: 279967
2016-08-29 12:47:22 +00:00
Rafael Espindola 46fa231c52 Move code only used by codegen out of MC. NFC.
MC itself never needs to know about these sections.

llvm-svn: 279965
2016-08-29 12:33:42 +00:00
Igor Breger 24281b4740 Fixed a bug in type legalizer for masked gather.
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961
2016-08-29 09:12:31 +00:00
Haojian Wu 407f275894 [InstructionSelect] NumBlocks isn't defined in DEBUG build.
Summary: A follow-up fixing on http://llvm.org/viewvc/llvm-project?view=revision&revision=279905.

Reviewers: bkramer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23985

llvm-svn: 279959
2016-08-29 08:48:15 +00:00
Quentin Colombet acb857b831 [RegBankSelect] Do not abort when the target wants to fall back.
llvm-svn: 279906
2016-08-27 02:38:27 +00:00
Quentin Colombet 948abf0a0f [InstructionSelect] Do not abort when the target wants to fall back.
llvm-svn: 279905
2016-08-27 02:38:24 +00:00
Quentin Colombet 5e60bcdeaf [MachineLegalize] Do not abort when the target wants to fall back.
llvm-svn: 279904
2016-08-27 02:38:21 +00:00
Quentin Colombet 374796d678 [GlobalISel] Add a fallback path to SDISel.
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.

llvm-svn: 279891
2016-08-27 00:18:31 +00:00
Quentin Colombet 6049524d37 [GlobalISel] Teach the core pipeline not to run if ISel failed.
llvm-svn: 279889
2016-08-27 00:18:24 +00:00
Quentin Colombet 3bb32cc79c [IRTranslator] Do not abort when the target wants to fall back.
Every pass in the GlobalISel pipeline will need to do something similar.

llvm-svn: 279886
2016-08-26 23:49:05 +00:00
Quentin Colombet e076d3094c [MFProperties] Introduce a FailedISel property.
This is used to communicate that the instruction selection pipeline
failed at some point.
Another way to achieve that would be to have some kind of conditional
scheduling in the PassManager, such that we only schedule a pass based
on the success/failure of another one. The property approach has the
advantage of being lightweight and solve the problem at stake.

llvm-svn: 279885
2016-08-26 23:49:01 +00:00
Quentin Colombet 0de43b225f [TargetPassConfig] Add a target hook to know what GlobalISel should do on error.
By default, this hook tells GlobalISel to abort (report a fatal error)
when it encounters an error. The alternative will be to fall back on
SDISel.
This fall back will be removed when the bring-up of GlobalISel is over.

llvm-svn: 279879
2016-08-26 22:32:59 +00:00
Quentin Colombet 1d0cb6f107 [IRTranslator][NFC] Use DEBUG_TYPE instead of repeating the name.
llvm-svn: 279878
2016-08-26 22:32:57 +00:00
Quentin Colombet e063e1f68a [SelectionDAG] Do not run the ISel process on already selected code.
Right now, this cannot happen, but with the fall back path of GlobalISel
it will show up eventually.

llvm-svn: 279877
2016-08-26 22:32:55 +00:00
Quentin Colombet 380cd3eb23 [MachineFunction] Introduce a reset method.
This method allows to reset the state of a MachineFunction as if it was
just created. This will be used during the bring-up of GlobalISel to
provide a way to fallback on SelectionDAG. That way, we can start doing
correctness testing even if we are not able to select all functions via
the global instruction selector.

llvm-svn: 279876
2016-08-26 22:32:53 +00:00
Quentin Colombet e609a9a80a [MFProperties] Introduce a reset method with no argument.
This method allows to reset all the properties in one go.

llvm-svn: 279874
2016-08-26 22:09:11 +00:00
Quentin Colombet c437aa9c26 [MFProperties][NFC] Rename clear into reset to match BitVector naming.
The name clear is used to reset all the bit in bitvectors and using it
to reset just properties was confusing.

llvm-svn: 279873
2016-08-26 22:09:08 +00:00
Kyle Butt 723aa1327c TailDuplication: Record blocks that received the duplicated block. NFC.
This will allow tail duplication during layout to handle the cfg changes more
cleanly.

llvm-svn: 279858
2016-08-26 20:12:40 +00:00
Reid Kleckner a5b1eef846 [MC] Move .cv_loc management logic out of MCContext
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.

Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.

llvm-svn: 279847
2016-08-26 17:58:37 +00:00
Tim Northover 051b8ad3d9 GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover cecee56abb GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover 7a753d9bec GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Krzysztof Parzyszek fb18d1e381 Missed a semicolon in r279835
llvm-svn: 279836
2016-08-26 16:50:57 +00:00
Krzysztof Parzyszek eb34b71f0a Add some more detailed debugging information in RegisterCoalescer
llvm-svn: 279835
2016-08-26 16:46:14 +00:00
Matt Arsenault f403df38eb Replace subregister uses when processing tied operands
This was for some reason skipping operands that are subregisters
instead of keeping the same subregister index.

v_movreld_b32 expects src0 to be the subregister of the tied
super register use/def.

e.g.

v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3]

was being replaced with

v[4:7] = copy v[0:3]
v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7],

which really writes to v[0:3]

llvm-svn: 279804
2016-08-26 06:31:32 +00:00
Michael Kuperstein 260daed147 Reuse an SDLoc throughout a function. NFC.
llvm-svn: 279767
2016-08-25 18:50:56 +00:00
Tim Northover 6c43b850b7 GlobalISel: add missing type to G_UADDE instructions
llvm-svn: 279762
2016-08-25 17:37:44 +00:00
Tim Northover 438c77ca1a GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
George Burgess IV b42e0e7fa3 Make buildbots happy.
"warning: extra ‘;’ [-Wpedantic]"

llvm-svn: 279703
2016-08-25 02:15:54 +00:00
Kyle Butt c7f1eac514 TailDuplication: Don't pass MMI separately from MF. NFC
MMI must match the function passed, and MF has a handle on MMI. Use that instead
of accepting it as separate argument. No Functional Change.

llvm-svn: 279701
2016-08-25 01:37:07 +00:00
Kyle Butt 3ed4273d33 TailDuplication: Save MF and reduce number of parameters. NFC
Save the function in the class, and then don't pass it around. This reduces the
number of parameters and makes calls to member functions simpler.
No Functional Change.

llvm-svn: 279700
2016-08-25 01:37:03 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
George Burgess IV 381fc0ee3c Make some LLVM_CONSTEXPR variables const. NFC.
This patch changes LLVM_CONSTEXPR variable declarations to const
variable declarations, since LLVM_CONSTEXPR expands to nothing if the
current compiler doesn't support constexpr. In all of the changed
cases, it looks like the code intended the variable to be const instead
of sometimes-constexpr sometimes-not.

llvm-svn: 279696
2016-08-25 01:05:08 +00:00
Eugene Zelenko 1804a77b2a Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes.
Differential revision: https://reviews.llvm.org/D23861

llvm-svn: 279695
2016-08-25 00:45:04 +00:00
Matthias Braun a319e2cae0 MIRParser/MIRPrinter: Compute HasInlineAsm instead of printing/parsing it
llvm-svn: 279680
2016-08-24 22:34:06 +00:00
Matthias Braun f1b20c5225 MachineRegisterInfo/MIR: Initialize tracksSubRegLiveness early, do not print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().

llvm-svn: 279676
2016-08-24 22:17:45 +00:00
Kyle Butt a8c7371d16 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279671
2016-08-24 21:34:27 +00:00
Kyle Butt 6262ca3448 IfConversion: Rescan diamonds.
The cost of predicating a diamond is only the instructions that are not shared
between the two branches. Additionally If a predicate clobbering instruction
occurs in the shared portion of the branches (e.g. a cond move), it may still
be possible to if convert the sub-cfg. This change handles these two facts by
rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
predication cost and whether both blocks are pred-clobbering.

Fixed 2 bugs before recommitting. Branch instructions must be compared and found
identical before diamond conversion. Also, predicate-clobbering instructions in
the shared prefix disqualifies a potential diamond conversion. Includes tests
for both.

llvm-svn: 279670
2016-08-24 21:34:24 +00:00
David Blaikie a01f295322 DebugInfo: Add flag to CU to disable emission of inline debug info into the skeleton CU
In cases where .dwo/.dwp files are guaranteed to be available, skipping
the extra online (in the .o file) inline info can save a substantial
amount of space - see the original r221306 for more details there.

llvm-svn: 279650
2016-08-24 18:29:49 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Matthias Braun 3a133159cc TargetSchedule: Do not consider subregister definitions as reads.
We should not consider subregister definitions as reads for schedule
model purposes (they are just modeled as reads of the overal vreg for
liveness calculation purposes, the CPU instructions are not actually
reading).

Unfortunately I cannot submit a test for this as it requires a target
which uses ReadAdvance annotation in the scheduling model and has
subregister liveness enabled at the same time, which is only the case on
an out of tree target.

llvm-svn: 279604
2016-08-24 02:32:29 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Matthias Braun 79f85b3b8f MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Matthias Braun c3b2e80b9d MachineModuleInfo: Avoid dummy constructor, use INITIALIZE_TM_PASS
Change this pass constructor to just accept a const TargetMachine * and
use INITIALIZE_TM_PASS, that way we can get rid of the dummy
constructor. The pass will still fail when calling the default
constructor leading to TM == nullptr, this is no different than before
but is more in line what other codegen passes are doing and avoids the
dummy constructor.

llvm-svn: 279598
2016-08-24 00:42:05 +00:00
Philip Reames d06a1b4cdc [stackmaps] Remove an unneeded member variable [NFC]
llvm-svn: 279590
2016-08-23 23:58:08 +00:00
Philip Reames e83c4b30ca [stackmaps] More extraction of common code [NFCI]
General cleanup before starting to work on the part I want to actually change.

llvm-svn: 279586
2016-08-23 23:33:29 +00:00
Richard Smith 84c4cc47f5 Don't use "return {...}" to initialize a std::tuple. This has only been valid
since 2015 (n4387), though it's allowed by a library DR so new implementations
accept it in their C++11 modes...

This should unbreak the build with libstdc++ 4.9.

llvm-svn: 279583
2016-08-23 22:21:58 +00:00
Richard Smith 418237bed8 #ifdef out validation code when asserts are disabled to remove unused variable
warnings.

llvm-svn: 279582
2016-08-23 22:14:15 +00:00
Richard Smith eae6138936 Remove unused data member to unbreak -Werror builds.
llvm-svn: 279581
2016-08-23 22:10:46 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Philip Reames 570dd009c3 [stackmaps] Extract out magic constants [NFCI]
This is a first step towards clarifying the exact MI semantics of stackmap's "live values".  

llvm-svn: 279574
2016-08-23 21:21:43 +00:00
Matthias Braun 90799ce8b2 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Tim Northover bdf67c9a00 GlobalISel: make truncate/extend casts uniform
They really should have both types represented, but early variants were created
before MachineInstrs could have multiple types so they're rather ambiguous.

llvm-svn: 279567
2016-08-23 21:01:33 +00:00
Tim Northover 6cd4b23a0f GlobalISel: legalize integer comparisons on AArch64.
Next step is doing both legalizations at the same time! Marvel at GlobalISel's
cunning.

llvm-svn: 279566
2016-08-23 21:01:26 +00:00
Tim Northover b3a0be4d38 GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Tim Northover a01bece1dc GlobalISel: extend legalizer interface to handle multiple types.
Instructions like G_ICMP have multiple types that may need to be legalized (the
boolean output and nearly arbitrary inputs in this case). So the legalizer must
be capable of deciding what to do for each of them separately.

llvm-svn: 279554
2016-08-23 19:30:42 +00:00
Tim Northover 3c73e367c0 GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Justin Lebar 1972e222ea [SelectionDAG] Use a union of bitfield structs for SDNode::SubclassData.
Summary:
This greatly simplifies our handling of SDNode::SubclassData.

NFC, hopefully.  :)

See discussion in D23035 for discussion about the design API of these
bitfields.

Reviewers: chandlerc

Subscribers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D23036

llvm-svn: 279537
2016-08-23 17:18:11 +00:00
Justin Lebar 0a33a7aefa [CodeGen] Convert a loop to a for-each loop. NFC
llvm-svn: 279536
2016-08-23 17:18:07 +00:00
Pete Cooper 036b94dad3 Fix some more asserts after r279466.
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types.  There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types.  These paths should be ok
to just print the name instead of crashing.

The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.

Thanks to Björn Pettersson for pointing out that there were more crashes.

llvm-svn: 279528
2016-08-23 16:23:45 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Pete Cooper 1523925daa Fix crash from assert in r279466.
The assert in r279466 checks that we call the correct version of
Intrinsic::getName.  The version which accepts only an ID should not
be used for intrinsics with overloaded types.  The global-isel
code was calling the wrong version.  The test CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
will ensure that we call the correct version from now on.

llvm-svn: 279487
2016-08-22 22:27:05 +00:00
Tim Shen f2187ed321 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Tim Shen a5cc25e50f [SSP] Do not set __guard_local to hidden for OpenBSD SSP
__guard_local is defined as long on OpenBSD. If the source file contains
a definition of __guard_local, it mismatches with the int8 pointer type
used in LLVM. In that case, Module::getOrInsertGlobal() returns a
cast operation instead of a GlobalVariable. Trying to set the
visibility on the cast operation leads to random segfaults (seen when
compiling the OpenBSD kernel, which also runs with stack protection).

In the kernel, the hidden attribute does not matter. For userspace code,
__guard_local is defined as hidden in the startup code. If a program
re-defines __guard_local, the definition from the startup code will
either win or the linker complains about multiple definitions
(depending on whether the re-defined __guard_local is placed in the
common segment or not).

It also matches what gcc on OpenBSD does.

Thanks Stefan Kempf <sisnkemp@gmail.com> for the patch!

Differential Revision: http://reviews.llvm.org/D23674

llvm-svn: 279449
2016-08-22 18:26:27 +00:00
Krzysztof Parzyszek 673b347e5a Reset isUndef when removing subreg from a def operand
llvm-svn: 279437
2016-08-22 14:50:12 +00:00
Simon Pilgrim 02b13d4d3c Use SDValue::getOpcode() helper instead of via SDValue::getNode()
llvm-svn: 279381
2016-08-20 20:04:18 +00:00
Matthias Braun 367d853042 MachineFunction: Add llvm_unreachable for missing properties
Most compilers should give you a warning anyway though.

llvm-svn: 279346
2016-08-19 23:03:28 +00:00
Krzysztof Parzyszek d95d100c28 Reset "undef" flag when coalescing subregister into whole register
llvm-svn: 279344
2016-08-19 22:57:23 +00:00
Tim Northover a11be04769 GlobalISel: support legalization of G_FCONSTANTs
llvm-svn: 279341
2016-08-19 22:40:08 +00:00
Tim Northover ea904f9424 GlobalISel: teach legalizer how to handle integer constants.
llvm-svn: 279340
2016-08-19 22:40:00 +00:00
Matthias Braun a7d6fc9618 MachineFunction: Cleanup/simplify MachineFunctionProperties::print()
- Always compile print() regardless of LLVM_ENABLE_DUMP. (We usually
  only gard dump() functions with that).
- Only show the set properties to reduce output clutter.
- Remove the unused variant that even shows the unset properties.
- Fix comments

llvm-svn: 279338
2016-08-19 22:31:45 +00:00
Matthias Braun a3b983aa5e MachineFunction: Make LastProperty an alias of the last property
This avoids unnecessary cases in switch statements covering all
properties.

llvm-svn: 279337
2016-08-19 22:31:42 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Krzysztof Parzyszek e4582d4a2e [Packetizer] Add debugging code to stop packetization after N instructions
llvm-svn: 279325
2016-08-19 21:12:52 +00:00
Tim Northover d5c23bcfc9 GlobalISel: translate floating-point comparisons
llvm-svn: 279319
2016-08-19 20:48:16 +00:00
Tim Northover b16734fbaa GlobalISel: translate floating-point constants
llvm-svn: 279311
2016-08-19 20:09:15 +00:00
Tim Northover 5a28c3642f GlobalISel: support translating select instructions.
llvm-svn: 279309
2016-08-19 20:09:07 +00:00
Tim Northover b604622bba GlobalISel: fix insert/extract to work on ConstantExprs too.
No tests yet unfortunately (ConstantFolding reduces all supported constants to
ConstantInts before we get to translation). Soon.

llvm-svn: 279308
2016-08-19 20:09:03 +00:00
Tim Northover bbbfb1cfb8 GlobalISel: translate insertvalue instructions.
This adds a G_INSERT instruction, which technically makes G_SEQUENCE redundant
(it's equivalent to a G_INSERT into an IMPLICIT_DEF). We'll leave G_SEQUENCE
for now though: it's likely to be far more common as it's a fundamental part of
legalization, so avoiding the mess and bloat of the extra IMPLICIT_DEFs is
probably worthwhile.

llvm-svn: 279306
2016-08-19 20:08:55 +00:00
Tom Stellard 68726a5359 MachineScheduler: Add constructor functions for the DAGMutations
Summary: This way they can be re-used by target-specific schedulers.

Reviewers: atrick, MatzeB, kparzysz

Subscribers: kparzysz, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D23678

llvm-svn: 279305
2016-08-19 19:59:18 +00:00
Tim Northover 26b76f2c59 GlobalISel: improve representation of G_SEQUENCE and G_EXTRACT
First, make sure all types involved are represented, rather than being implicit
from the register width.

Second, canonicalize all types to scalar. These operations just act in bits and
don't care about vectors.

Also standardize spelling of Indices in the MachineIRBuilder (NFC here).

llvm-svn: 279294
2016-08-19 18:32:14 +00:00
Kyle Butt 5b10483618 Revert "IfConversion: Rescan diamonds."
This reverts commit bfd62a4b4465dd21811bf615c3b04c30ddb09f7b.

llvm-svn: 279289
2016-08-19 18:17:06 +00:00
Kyle Butt ce0196de3f Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."
This reverts commit 0fda93481c4231c06b838ef476c0c404c51ff875.

llvm-svn: 279288
2016-08-19 18:17:04 +00:00
Tim Northover 2fa5fa391f GlobalISel: allow extractvalue to extract an aggregate.
llvm-svn: 279287
2016-08-19 18:09:41 +00:00
Tim Northover 6f80b08c64 GlobalISel: support translation of extractvalue instructions.
llvm-svn: 279285
2016-08-19 17:47:05 +00:00
Tim Northover 91c8173093 GlobalISel: support overflow arithmetic intrinsics.
Unsigned addition and subtraction can reuse the instructions created to
legalize large width operations (i.e. both produce and consume a carry flag).
Signed operations and multiplies get a dedicated op-with-overflow instruction.

Once this is produced the two values are combined into a struct register (which
will almost always be merged with a corresponding G_EXTRACT as part of
legalization).

llvm-svn: 279278
2016-08-19 17:17:06 +00:00
James Molloy 7ee640f9b6 [CodeGen] Fix a trivial type conversion bug dating back to pre-2008
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.

Fixes PR29041.

llvm-svn: 279223
2016-08-19 08:38:50 +00:00
Matthias Braun fdc4c6b426 Revert "RegScavenging: Add scavengeRegisterBackwards()"
The ppc64 multistage bot fails on this.

This reverts commit r279124.

Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.

llvm-svn: 279199
2016-08-19 03:03:24 +00:00