Commit Graph

23755 Commits

Author SHA1 Message Date
Matt Arsenault f6ee94c1c6 DAG: Add computeKnownBitsForFrameIndex
Some of the AMDGPU stack addressing modes require knowing the sign
bit is zero. We used to accomplish this by custom lowering
frame indexes, and then putting an AssertZext around a
TargetFrameIndex. This required specifically looking for
the AssextZext + frame index pattern which was moderately
disgusting. The same could probably be accomplished
with a target specific node, but would still
require special handling of frame indexes.

llvm-svn: 317671
2017-11-08 08:52:31 +00:00
Serguei Katkov 3664aa8658 Revert "[CGP] Enable extending scope of optimizeMemoryInst"
Revert the patch r317665 causing buildbot failures.

llvm-svn: 317667
2017-11-08 05:38:54 +00:00
Serguei Katkov ee892325bf [CGP] Enable extending scope of optimizeMemoryInst
This patch enables the folding of address computation in
memory instruction in case adress is represented by Phi node.

The inputs of Phi node might be different in base register.

Differential Revision: https://reviews.llvm.org/D36073

llvm-svn: 317665
2017-11-08 05:02:51 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Craig Topper 87e715fbac [CodeGenPrepare] Fix typo in comment. NFC
llvm-svn: 317614
2017-11-07 20:56:17 +00:00
Craig Topper 41bf240726 [SelectionDAG] Fix typo in comment. NFC
llvm-svn: 317588
2017-11-07 16:32:31 +00:00
Petar Jovanovic e2a585dddc Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

llvm-svn: 317579
2017-11-07 14:40:27 +00:00
Kristof Beyls 78aa4b28a3 Mark intentional fall-through with LLVM_FALLTHROUGH.
... to silence gcc 7's default -Wimplicit-fallthrough.

llvm-svn: 317573
2017-11-07 13:31:52 +00:00
Kristof Beyls 178818ba20 Silence C4715 warning from MSVC (NFC).
The warning started triggering after r317560.
This commit silences it in the same way as previously done in a similar
situation, see
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140915/236088.html

llvm-svn: 317568
2017-11-07 11:54:00 +00:00
Kristof Beyls af9814a1fc [GlobalISel] Enable legalizing non-power-of-2 sized types.
This changes the interface of how targets describe how to legalize, see
the below description.

1. Interface for targets to describe how to legalize.

In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.

For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:

  for (auto Ty : {s32, s64})
    setAction({G_ADD, 0, s32}, Legal);

or for a target that needs a library call for a 32 bit division:

  setAction({G_SDIV, s32}, Libcall);

The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar).  A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.

Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy".  For example:

   setLegalizeScalarToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerAndNarrowToLargest);

This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.

Another example, taken from the ARM backend is:
   for (unsigned Op : {G_SDIV, G_UDIV}) {
     setLegalizeScalarToDifferentSizeStrategy(Op, 0,
         widenToLargerTypesUnsupportedOtherwise);
     if (ST.hasDivideInARMMode())
       setAction({Op, s32}, Legal);
     else
       setAction({Op, s32}, Libcall);
   }

For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).

The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:

   setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
   setLegalizeVectorElementToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);

As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above.  The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.

Therefore, for the above specification, some example legalizations are:
  * getAction({G_ADD, LLT::vector(3, 3)})
    returns {WidenScalar, LLT::vector(3, 8)}
  * getAction({G_ADD, LLT::vector(3, 8)})
    then returns {MoreElements, LLT::vector(8, 8)}
  * getAction({G_ADD, LLT::vector(20, 8)})
    returns {FewerElements, LLT::vector(16, 8)}


2. Key implementation aspects.

How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:

       setScalarAction({G_ADD, LLT:scalar(1)},
                       {{1, WidenScalar},  // bit sizes [ 1, 31[
                        {32, Legal},       // bit sizes [32, 33[
                        {33, WidenScalar}, // bit sizes [33, 64[
                        {64, Legal},       // bit sizes [64, 65[
                        {65, NarrowScalar} // bit sizes [65, +inf[
                       });

Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized.  Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.

I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.

This drops the need for LLT::{half,double}...Size().


Differential Revision: https://reviews.llvm.org/D30529

llvm-svn: 317560
2017-11-07 10:34:34 +00:00
Serguei Katkov 365200295a [CGP] Disable Select instruction handling in optimizeMemoryInst. NFC
This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.

The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.

Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.

In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.

Differential Revision: https://reviews.llvm.org/D36073

llvm-svn: 317555
2017-11-07 09:43:08 +00:00
Adrian Prantl 25a09dd408 Make DIExpression::createFragmentExpression() return an Optional.
We can't safely split arithmetic into multiple fragments because we
can't express carry-over between fragments.

llvm-svn: 317534
2017-11-07 00:45:34 +00:00
Bjorn Pettersson a42ed3e361 [MIRPrinter] Use %subreg.xxx syntax for subregister index operands
Summary:
Print %subreg.<subregidxname> instead of just the subregister
index when printing immediate operands corresponding to subreg
indices in INSERT_SUBREG, EXTRACT_SUBREG, SUBREG_TO_REG and
REG_SEQUENCE.

Reviewers: qcolombet, MatzeB

Reviewed By: MatzeB

Subscribers: nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D39696

llvm-svn: 317513
2017-11-06 21:46:06 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Serguei Katkov aee6375b02 [CGP] Fix the bug found by asan.
Try to fix the asan failure introduced by r317429.

llvm-svn: 317431
2017-11-05 07:59:02 +00:00
Serguei Katkov d5d8d54b08 [CGP] Extends the scope of optimizeMemoryInst optimization
This is an implementation of PR26223.

Currently optimizeMemoryInst optimization tries to fold address computation
if all possible way to get compute the address are of the form

baseGV + base + scale * Index + offset
where scale and offset are constants and baseGV, base and Index are exactly
the same instructions if defined.

The patch extends this optimization to allow different bases. In this case
it tries to find/build a Phi node merging all possible bases and use this Phi node
as a base for sunk address computation. Also it supports Select instruction on
the way.

The main motivation for this scope extension is GCRelocateInst.
If there is a relocation of derived pointer it will be represented as relocation of base + offset.
Also there will be a Phi node merging address computation for relocated derived pointer
and derived pointer itself. If we have a Phi node merging original base and relocated base
and can fold the address computation of derived pointer then we can potentially reduce
the code size and Phi node for derived pointer. The later can have a positive impact to
register allocator.

Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn
Reviewed By: john.brawn
Subscribers: javed.absar, john.brawn, dneilson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36073

llvm-svn: 317429
2017-11-05 05:50:33 +00:00
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Adrian Prantl 261ac8b23c Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()
This preserves the debug info for the cast operation in the original location.

rdar://problem/33460652

Reapplied r317340 with the test moved into an ARM-specific directory.

llvm-svn: 317375
2017-11-03 21:55:03 +00:00
David Blaikie 526f30b8aa Modularize: Include some required headers
DenseMaps require the definition of a type to be available when using a
pointer to that type as a key to know how many bits are available for
tombstone/etc.

llvm-svn: 317360
2017-11-03 20:24:19 +00:00
Adrian Prantl 8fe9fb0ae5 Revert "Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()"
This reverts commit 317342 while investigating bot breakage.

llvm-svn: 317345
2017-11-03 18:26:36 +00:00
Craig Topper 666e23b513 [CodeGen] Remove unnecessary semicolons to fix a warning. NFC
llvm-svn: 317342
2017-11-03 18:02:46 +00:00
Adrian Prantl 58e9a0bb16 Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()
This preserves the debug info for the cast operation in the original location.

rdar://problem/33460652

llvm-svn: 317340
2017-11-03 18:00:02 +00:00
Clement Courbet 063bed9baf re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/.

llvm-svn: 317318
2017-11-03 12:12:27 +00:00
Francis Visoiu Mistrih d96395fc92 [PEI] Simplify handling of targets with no phys regs. NFC
Make doSpillCalleeSavedRegs a member function, instead of passing most of the
members of PEI as arguments.

Differential Review: https://reviews.llvm.org/D35642

llvm-svn: 317309
2017-11-03 09:46:36 +00:00
Puyan Lotfi a521c4ac55 mir-canon: First commit.
mir-canon (MIRCanonicalizerPass) is a pass designed to reorder instructions and
rename operands so that two similar programs will diff more cleanly after being
run through mir-canon than they would otherwise. This project is still a work
in progress and there are ideas still being discussed for improving diff
quality.

M    include/llvm/InitializePasses.h
M    lib/CodeGen/CMakeLists.txt
M    lib/CodeGen/CodeGen.cpp
A    lib/CodeGen/MIRCanonicalizerPass.cpp

llvm-svn: 317285
2017-11-02 23:37:32 +00:00
Hiroshi Yamauchi dce9def3dd Irreducible loop metadata for more accurate block frequency under PGO.
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.

The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.

This patch is a basic support for this.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39028

llvm-svn: 317278
2017-11-02 22:26:51 +00:00
Clement Courbet 82bade615b Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage

This reverts commit eea333c33fa73ad225ef28607795984829f65688.

llvm-svn: 317213
2017-11-02 15:53:10 +00:00
Clement Courbet 1dc37b9c3b [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.

See the discussion in PR33325 for more details.

Reviewers: spatel

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D39456

llvm-svn: 317211
2017-11-02 15:02:51 +00:00
Ayman Musa a37d1130d7 [X86] Fix bug in legalize vector types - Split large loads
When splitting a large load to smaller legally-typed loads, the last load should be padded to reach the size of the previous one so a CONCAT_VECTORS node could reunite them again.
The code currently pads the last load to reach the size of the first load (instead of the previous).

Differential Revision: https://reviews.llvm.org/D38495

Change-Id: Ib60b55ed26ce901fabf68108daf52683fbd5013f
llvm-svn: 317206
2017-11-02 13:07:06 +00:00
Francis Visoiu Mistrih 66d2c269dc [AsmPrinterDwarf] Add support for .cfi_restore directive
As of today we only use .cfi_offset to specify the offset of a CSR, but
we never use .cfi_restore when the CSR is restored.

If we want to perform a more advanced type of shrink-wrapping, we need
to use .cfi_restore in order to switch the CFI state between blocks.

This patch only aims at adding support for the directive.

Differential Revision: https://reviews.llvm.org/D36114

llvm-svn: 317199
2017-11-02 12:00:58 +00:00
Petar Jovanovic bb5c84fb57 Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
2017-11-01 23:05:52 +00:00
Petar Jovanovic f2faee92aa Correct dwarf unwind information in function epilogue for X86
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.


Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D35844

llvm-svn: 317100
2017-11-01 16:04:11 +00:00
Simon Pilgrim 687982c181 [SelectionDAG] computeKnownBits - use ashrInPlace on known bits of ISD::SRA input. NFCI.
llvm-svn: 317087
2017-11-01 13:16:48 +00:00
Craig Topper c51aac675d [DAGCombiner] Fix typos in comments. NFC
llvm-svn: 317072
2017-11-01 03:30:52 +00:00
Reid Kleckner bc6f52da82 [codeview] Merge file checksum entries for DIFiles with the same absolute path
Change the map key from DIFile* to the absolute path string. Computing
the absolute path isn't expensive because we already have a map that
caches the full path keyed on DIFile*.

llvm-svn: 317041
2017-10-31 21:52:15 +00:00
Serguei Katkov f66a59ee88 [CGP] Fix the detection of trivial case for addressing mode
The address can be presented as a bitcast of baseReg.
In this case it is still trivial but OriginalValue != baseReg.

llvm-svn: 316980
2017-10-31 07:01:35 +00:00
Philip Reames 9c3cbeea39 [CGP] Fix crash on i96 bit multiply
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725

If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area.  There's a bunch of obviously wrong code in the same function.  I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.

llvm-svn: 316967
2017-10-30 23:59:51 +00:00
Simon Pilgrim 9cd7abbcff Fix unused variable warnings. NFCI.
llvm-svn: 316964
2017-10-30 22:38:07 +00:00
Simon Pilgrim 80b371361c [SelectionDAG] Tidyup computeKnownBits extension/truncation cases. NFCI.
We don't need to extend/truncate the Known structure before calling computeKnownBits - it will reset at the start of the function.

llvm-svn: 316962
2017-10-30 22:23:57 +00:00
Daniel Neilson f9c7d29c77 Create instruction classes for identifying any atomicity of memory intrinsic. (NFC)
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html

This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:

IntrinsicInst
  -> MemIntrinsicBase (methods-only class)
    -> MemIntrinsic (non-atomic intrinsics)
      -> MemSetInst
      -> MemTransferInst
        -> MemCpyInst
        -> MemMoveInst
    -> AtomicMemIntrinsic (atomic intrinsics)
      -> AtomicMemSetInst
      -> AtomicMemTransferInst
        -> AtomicMemCpyInst
        -> AtomicMemMoveInst
    -> AnyMemIntrinsic (both atomicities)
      -> AnyMemSetInst
      -> AnyMemTransferInst
        -> AnyMemCpyInst
        -> AnyMemMoveInst

This involves some class renaming:
    ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
    ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
    ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.

An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).

---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"

REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"

FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )

SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"

for f in $FILES; do
    echo "Processing: $f"
    sed  -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done

Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev

Reviewed By: sanjoy

Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38419

llvm-svn: 316950
2017-10-30 19:51:48 +00:00
Simon Pilgrim 017f896adb [SelectionDAG] Add VSELECT demanded elts support to computeKnownBits
llvm-svn: 316947
2017-10-30 19:31:08 +00:00
Simon Pilgrim 96a0b9ef54 [SelectionDAG] Add VSELECT support to computeKnownBits
llvm-svn: 316944
2017-10-30 19:08:21 +00:00
Simon Pilgrim 5da11dfd24 [SelectionDAG] Add SELECT demanded elts support to ComputeNumSignBits
llvm-svn: 316933
2017-10-30 17:53:51 +00:00
Simon Pilgrim 194693e996 [MC] Split out register def/use idx calls to make debugging simpler. NFCI.
llvm-svn: 316927
2017-10-30 17:24:40 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Javed Absar 5cde1ccb29 [GlobalISel|ARM] : Allow legalizing G_FSUB
Adding support for VSUB.
Reviewed by: @rovka
Differential Revision: https://reviews.llvm.org/D39261

llvm-svn: 316902
2017-10-30 13:51:56 +00:00
Simon Pilgrim 601ae238b7 [SelectionDAG] Add SEXT/AND/XOR/Or demanded elts support to ComputeNumSignBits
llvm-svn: 316875
2017-10-29 22:03:37 +00:00
Simon Pilgrim 7613a7b564 [SelectionDAG] Add SRA/SHL demanded elts support to ComputeNumSignBits
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.

llvm-svn: 316866
2017-10-29 18:19:37 +00:00
Simon Pilgrim b37a24e82f [SelectionDAG] Add support for INSERT_SUBVECTOR to computeKnownBits
llvm-svn: 316847
2017-10-28 22:10:40 +00:00
Simon Pilgrim d09c1ac20f [SelectionDAG] Support 'bit preserving' floating points bitcasts on computeKnownBits/ComputeNumSignBits
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.

This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.

Differential Revision: https://reviews.llvm.org/D39289

llvm-svn: 316831
2017-10-28 14:27:53 +00:00
David Blaikie 8699f71310 Add a few missing headers for modularization/IWYU/etc
Several cases where class definitions are required for DenseMap pointer
traits handling.

llvm-svn: 316803
2017-10-27 22:12:46 +00:00
Guozhi Wei 7c67009fe5 [DAGCombine] Don't combine sext with extload if sextload is not supported and extload has multi users
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then

  if sext is the only user of extload, there is no big difference, no harm no benefit.
  if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.

This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.

Differential Revision: https://reviews.llvm.org/D39108

llvm-svn: 316802
2017-10-27 21:54:24 +00:00
Clement Courbet e1eafe0a54 [CodeGen] Fix -Wunused-private-field warning on lld-x86_64-darwin13.
llvm-svn: 316765
2017-10-27 13:34:41 +00:00
Clement Courbet be684eee82 [CodeGen][ExpandMemCmp][NFC] Simplify load sequence generation.
llvm-svn: 316763
2017-10-27 12:34:18 +00:00
Matt Arsenault 878827d93a DAG: Fold fma (fneg x), K, y -> fma x, -K, y
llvm-svn: 316753
2017-10-27 09:06:07 +00:00
Sean Fertile 57d46b8436 Add subclass data to the FoldingSetNode for MemIntrinsicSDNodes.
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.

Differential Revision: https://reviews.llvm.org/D38898

llvm-svn: 316737
2017-10-27 04:02:51 +00:00
Balaram Makam 32bcb5d7fb Revert "[CGP] Merge empty case blocks if no extra moves are added."
This reverts commit r316711. The domtree isn't getting updated correctly.

llvm-svn: 316721
2017-10-27 00:35:18 +00:00
Balaram Makam cddf3c5e1c [CGP] Merge empty case blocks if no extra moves are added.
Summary:
Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.

Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet

Reviewed By: efriedma

Subscribers: dberlin, kuhar, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D37343

llvm-svn: 316711
2017-10-26 22:34:01 +00:00
Mandeep Singh Grang 049ed12df7 [MachineModuleInfoImpls] Replace qsort with array_pod_sort
Summary:
This seems to be the only place in llvm we directly call qsort. We can replace
this with a call to array_pod_sort. Also minor cleanup of the sorting function.

Reviewers: bkramer, Eugene.Zelenko, rafael

Reviewed By: bkramer

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D39214

llvm-svn: 316671
2017-10-26 16:07:20 +00:00
Hans Wennborg caceb64067 Tidy up CountingFunctionInserter a little. NFC.
Use StringRef for CountingFunctionName, remove erroneous comment
copied from InstructionNamer, and drop some trailing whitespace.

llvm-svn: 316644
2017-10-26 08:29:08 +00:00
Aditya Nandakumar d2a954d0ae Make the combiner check if shifts are legal before creating them
Summary: Make sure shifts are legal/specified by the legalizerinfo before creating it

Reviewers: qcolombet, dsanders, rovka, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39264

llvm-svn: 316602
2017-10-25 18:49:18 +00:00
Clement Courbet 0c7cd071f7 Re-land "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads (1)"
Compute the actual decomposition only after deciding whether to expand
of not. Else, it's easy to make the compiler OOM with:
`memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if
someone mistakenly passes a negative value. Add a test.

This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44.

llvm-svn: 316567
2017-10-25 11:02:09 +00:00
Jonas Paulsson 238c14b6c7 [MachineScheduler] Minor refactoring.
Duplicated code found in three places put into a new static function:

/// Given a Count of resource usage and a Latency value, return true if a
/// SchedBoundary becomes resource limited.
static bool checkResourceLimit(unsigned LFactor, unsigned Count,
                               unsigned Latency) {
  return (int)(Count - (Latency * LFactor)) > (int)LFactor;
}

Review: Florian Hahn, Matthias Braun
https://reviews.llvm.org/D39235

llvm-svn: 316560
2017-10-25 08:23:33 +00:00
Matt Arsenault 8a752b77a2 DAG: Fix creating select with wrong condition type
This code added in r297930 assumed that it could create
a select with a condition type that is just an integer
bitcast of the selected type. For AMDGPU any vselect is
going to be scalarized (although the vector types are legal),
and all select conditions must be i1 (the same as getSetCCResultType).

This logic doesn't really make sense to me, but there's
never really been a consistent policy in what the select
condition mask type is supposed to be. Try to extend
the logic for skipping the transform for condition types
that aren't setccs. It doesn't seem quite right to me though,
but checking conditions that seem more sensible (like whether the
vselect is going to be expanded) doesn't work since this
seems to depend on that also.

llvm-svn: 316554
2017-10-25 07:14:07 +00:00
Adrian Prantl 2eb7cbf987 Implement salavageDebugInfo functionality for SelectionDAG.
Similar to how llvm::salvagDebugInfo hooks into InstCombine, this adds
a hook that can be invoked before an SDNode that is associated with an
SDDbgValue is erased to capture the effect of the deleted node in a
DIExpression.

The motivating example is an SDDebugValue attached to an ADD operation
that gets folded into a LOAD+OFFSET operation.

rdar://problem/32121503

llvm-svn: 316525
2017-10-24 22:55:12 +00:00
Martin Bohme 678c3e3633 Revert "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads (1)"
This reverts commit r316417, which causes internal compiles to OOM.
I don't unfortunately have a self-contained test case but will follow up
with courbet.

llvm-svn: 316497
2017-10-24 20:40:02 +00:00
Adrian Prantl 4569cedf16 Use range-based for loop. NFC
llvm-svn: 316496
2017-10-24 20:38:00 +00:00
Adrian Prantl 1a043aefc4 Use range-based-for. NFC
llvm-svn: 316485
2017-10-24 19:32:59 +00:00
Justin Bogner 6c452834a1 MIR: Print the register class or bank in vreg defs
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,

  %1(s64) = COPY %0(s64)

would now be written as

  %1:gpr(s64) = COPY %0(s64)

While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.

Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.

llvm-svn: 316479
2017-10-24 18:04:54 +00:00
Adrian Prantl 93a3777b7d Doxygenify comments.
llvm-svn: 316466
2017-10-24 17:23:40 +00:00
Simon Pilgrim 1bc62f03a5 [SelectionDAG] Add VSELECT support to ComputeNumSignBits
llvm-svn: 316457
2017-10-24 16:38:38 +00:00
Clement Courbet efd5177d5e [CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads (1)
Refactor ExpandMemcmp:

 - Stop duplicating the logic for computation of the sequence of loads to
   generate (thsi was done in three different places), this is now done
   only once in MemCmpExpansion::MemCmpExpansion().

 - Add a FIXME to expose a bug with the computation of the number of loads
   when not all sizes are loadable. For example, on X86-32 + SSE, possible
   loads are {16,4,2,1} bytes. The current code considers that all loads
   starting at MaxLoadSize are possible. This is not an issue right now as
   vector loads are not enabled, so I'm not fixing the issue here to keep
   the change as small as possible. I'm going to address this in a
   subsequent revision, where I enable vector loads.

See https://bugs.llvm.org/show_bug.cgi?id=34887

Differential Revision: https://reviews.llvm.org/D38498

llvm-svn: 316417
2017-10-24 08:05:07 +00:00
Omer Paparo Bivas 2251c79aba [MC] Adding code padding for performance stability - infrastructure. NFC.
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.

Reviewers:
aaboud
zvi
craig.topper
gadi.haber

Differential revision: https://reviews.llvm.org/D34393

Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
2017-10-24 06:16:03 +00:00
Jessica Paquette 9df7fde269 [MachineOutliner] Add optimisation remarks for successful outlining
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.

To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!

(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
 

llvm-svn: 316396
2017-10-23 23:36:46 +00:00
George Burgess IV 7887238c7c Fix buildbot breakage
SP is only used in an assert. Caused by r316374.

llvm-svn: 316377
2017-10-23 21:08:02 +00:00
George Burgess IV 8a0e4bc972 Don't crash when we see unallocatable registers in clobbers
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.

This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.

Differential Revision: https://reviews.llvm.org/D39030

llvm-svn: 316374
2017-10-23 20:46:36 +00:00
Jessica Paquette 1934fd2c53 [MachineOutliner] NFC: Rename getters/setters to fit coding style
Rename endIdx, startIdx, and length to getEndIdx, getStartIdx, and getLength
in Candidate.

llvm-svn: 316341
2017-10-23 16:25:53 +00:00
Simon Pilgrim 32da2f9245 [DAGCombine] Permit combining of shuffles of equivalent splat BUILD_VECTORs
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.

This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.

Differential Revision: https://reviews.llvm.org/D38696

llvm-svn: 316331
2017-10-23 15:48:08 +00:00
Marina Yatsina f9371d821f Add logic to greedy reg alloc to avoid bad eviction chains
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810

This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload

Such sequences are created in 2 scenarios:

Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).

Differential Revision: https://reviews.llvm.org/D35816

Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
2017-10-22 17:59:38 +00:00
Florian Hahn b0a263cf94 [SelectionDAG] Use dyn_cast without cast.
llvm-svn: 316258
2017-10-21 05:37:10 +00:00
Florian Hahn 3d81254b7c [SelectionDAG] Use isa to silence unused variable warning (NFC).
llvm-svn: 316257
2017-10-21 04:57:03 +00:00
Craig Topper 554151160f [SelectionDAG] Don't subject ConstantSDNodes to the depth limit in computeKnownBits and ComputeNumSignBits.
We don't need to do any additional recursion, we just need to analyze the APInt stored in the node. This matches what the ValueTracking versions do for IR.

llvm-svn: 316256
2017-10-21 03:22:13 +00:00
Craig Topper 195dad4264 [SelectionDAG] Don't subject ISD:Constant to the depth limit in TargetLowering::SimplifyDemandedBits.
Summary:
We shouldn't recurse any further but it doesn't mean we shouldn't be able to give the known bits for a constant. The caller would probably like that we always return the right answer for a constant RHS. This matches what InstCombine does in this case.

I don't have a test case because this showed up while trying to revive D31724.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D38967

llvm-svn: 316255
2017-10-21 02:27:19 +00:00
Krzysztof Parzyszek 9d19c8cac9 [Packetizer] Add function to check for aliasing between instructions
llvm-svn: 316243
2017-10-20 22:08:40 +00:00
Sam Clegg 12fd3da9d1 [WebAssembly] MC: Fix crash when -g specified.
At this point we don't output any debug sections or thier
relocations.

Differential Revision: https://reviews.llvm.org/D39076

llvm-svn: 316240
2017-10-20 21:28:38 +00:00
Craig Topper ff69ffbf9a [SelectionDAG] Add a check to getVectorShuffle to ensure that the only negative index we allow is -1.
llvm-svn: 316183
2017-10-19 20:59:41 +00:00
NAKAMURA Takumi 6f43bd4bde Untabify.
llvm-svn: 316079
2017-10-18 13:31:28 +00:00
Jessica Paquette 60d31fc3a9 [MachineOutliner][NFC] Clean up prune logic a bit
Move the prune logic in pruneOverlaps to a new function, prune. This lets us
reuse the prune functionality. Makes the code a bit more readable. It'll also
make it easier to emit remarks/debug statements for pruned functions.

llvm-svn: 316031
2017-10-17 21:11:58 +00:00
Jessica Paquette 85af63d044 [MachineOutliner][NFC] Move decrement logic to OutlinedFunction
This commit moves the decrement logic for outlined functions into the class,
and makes OccurrenceCount private. It can now be accessed via
getOccurrenceCount().

This makes it more difficult to accidentally introduce bugs by incorrectly
decrementing the occurrence count on OutlinedFunctions.

llvm-svn: 316020
2017-10-17 19:03:23 +00:00
Jessica Paquette c9ab4c2634 [MachineOutliner][NFC] Move end index calculation into Candidate
Cleanup to Candidate that moves all end index calculations into
Candidate.endIdx(). For the sake of consistency, StartIdx and Len are now
private members, and can be accessed with length() and startIdx() respectively.

llvm-svn: 316019
2017-10-17 18:43:15 +00:00
Simon Pilgrim 654bbb3b05 [DAGCombine] Add SCALAR_TO_VECTOR undef handling to simplifyShuffleMask.
This allows us to simplify later visitVECTOR_SHUFFLE optimizations such as combineShuffleOfScalars.

Noticed whilst working on D38696

llvm-svn: 316017
2017-10-17 18:14:48 +00:00
Yichao Yu a18b0b1817 Fix implicit null check with negative offset
Summary:
It seems that negative offset was accidentally allowed in D17967.
AFAICT small negative offset should be valid (always raise segfault) on all archs that I'm aware of (especially x86, which is the only one with this optimization enabled) and such case can be useful when loading hiden metadata from an object.

However, like the positive side, it should only be done within a certain limit.
For now, use the same limit on the positive side for the negative side.
A separate option can be added if needs appear.

Reviewers: mcrosier, skatkov

Reviewed By: skatkov

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D38925

llvm-svn: 315991
2017-10-17 11:47:36 +00:00
Mark Searles 4e3d6160db Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats).
Differential Revision: https://reviews.llvm.org/D38466

llvm-svn: 315957
2017-10-16 23:38:53 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Alexander Timofeev 9dff31c769 [AMDGPU] : revert r315908
llvm-svn: 315916
2017-10-16 16:57:37 +00:00
Alexander Timofeev 3828242c7e [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the dead one
Differential revision: https://reviews.llvm.org/D38754

llvm-svn: 315908
2017-10-16 14:35:29 +00:00
Sjoerd Meijer 7508fbd581 ISel type legalizer: debug messages. NFC.
Minor addition and follow up of r314773 and r311533: this adds more
debug messages to the type legalizer. For each node, it dumps
legalization info for results and operands nodes, rather than just the
final legalized node.

Differential Revision: https://reviews.llvm.org/D38726

llvm-svn: 315904
2017-10-16 14:07:30 +00:00
Daniel Sanders ea8711b88e Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315887
2017-10-16 03:36:29 +00:00
Daniel Sanders ce72d611af Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
MSVC doesn't like one of the constructors.

llvm-svn: 315886
2017-10-16 02:15:39 +00:00
Daniel Sanders 6735ea86cd [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315885
2017-10-16 01:16:35 +00:00
Daniel Sanders df39cbae2f Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Hopefully fixed the ambiguous constructor that a large number of bots reported.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315869
2017-10-15 18:22:54 +00:00
Daniel Sanders bb082a36d3 Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
A large number of bots are failing on an ambiguous constructor call.

llvm-svn: 315866
2017-10-15 17:51:07 +00:00
Daniel Sanders b95b867dd8 [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315863
2017-10-15 17:03:36 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
Quentin Colombet 4a6b75012d [RegisterBankInfo] Cache the getMinimalPhysRegClass information
TargetRegisterInfo::getMinimalPhysRegClass is actually pretty expensive
because it has to iterate over all the register classes.
Cache this information as we need and get it so that we limit its usage.
Right now, we heavily rely on it, because this is how we get the mapping
for vregs defined by copies from physreg (i.e., the one that are ABI
related).

Improve compile time by up to 10% for that pass.

NFC

llvm-svn: 315759
2017-10-13 21:16:15 +00:00
Quentin Colombet d58265ad55 [Legalizer] Use SmallSetVector instead of SetVector.
NFC

llvm-svn: 315758
2017-10-13 21:16:14 +00:00
Quentin Colombet b1a3bd1529 [LegalizerInfo] Don't evaluate end boundary every time through the loop
Match the LLVM coding standard for loop conditions.

NFC.

llvm-svn: 315757
2017-10-13 21:16:13 +00:00
Quentin Colombet 86220488b4 [Legalizer] Only allocate the SetVectors once per function.
Prior to this patch we used to create SetVectors in temporaries that
were created and destroyed for each instruction. Now, instead we create
and destroyed them only once, but clear the content for each
instruction.
This speeds up the pass by ~25%.

NFC.

llvm-svn: 315756
2017-10-13 21:16:05 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Matt Arsenault 846dd6b2fa DAG: Add flags to dumps
llvm-svn: 315690
2017-10-13 15:41:40 +00:00
Craig Topper 6794eb8ba1 [SelectionDAG] Cleanup the SIGN_EXTEND_INREG handling in computeKnownBits. NFCI
Use less temporary APInts. Use bit counting more. Don't call getScalarSizeInBits so many places, just capture it once.

llvm-svn: 315671
2017-10-13 05:35:35 +00:00
Craig Topper bde7abe9c9 [SelectionDAG] Fix typo in comment. NFC
llvm-svn: 315670
2017-10-13 05:35:34 +00:00
Craig Topper d6630b9889 [SelectionDAG] Correct the early out in SelectionDAG::getZeroExtendInReg to work properly for vector types.
I don't know if we ever hit this case or not. Turning it into an assert only fired on expanding some atomic operation in a SystemZ lit test.

llvm-svn: 315648
2017-10-13 00:18:58 +00:00
Adam Nemet 01104aee1a Add DK_Remark to SMDiagnostic
Swift uses SMDiagnostic for diagnostic messages. For
https://github.com/apple/swift/pull/12294, we need remark support.

I picked the color that clang uses to display them.

Differential Revision: https://reviews.llvm.org/D38865

llvm-svn: 315642
2017-10-12 23:56:02 +00:00
Craig Topper 0e8211ea57 [SelectionDAG] Const-correct the DemandedMask argument to one of the overloads of SimplifyDemandedBits. NFC
llvm-svn: 315641
2017-10-12 23:46:05 +00:00
Matthias Braun bb8507e63c Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.

This reverts commit r315633.

llvm-svn: 315637
2017-10-12 22:57:28 +00:00
Adrian Prantl a2f9e3a60f Deprecate DwarfUnit::addBlockByrefAddress().
The clang frontend already creates a DIExpression that replicates the
logic in addBlockByrefAddress() exactly, thus making this function
effectively unreachable. To guard against human error I'm hereby
marking the function with an assertion and let it hit the bots before
eventually removing it.

rdar://problem/31629055

llvm-svn: 315636
2017-10-12 22:54:36 +00:00
Matthias Braun 3a9c114b24 TargetMachine: Merge TargetMachine and LLVMTargetMachine
Merge LLVMTargetMachine into TargetMachine.

- There is no in-tree target anymore that just implements TargetMachine
  but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
  case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
  interface.

Differential Revision: https://reviews.llvm.org/D38489

llvm-svn: 315633
2017-10-12 22:28:54 +00:00
Craig Topper d52cf99f53 [SelectionDAG] Simplify the ISD::SIGN_EXTEND/ZERO_EXTEND handling to use less temporary APInts by counting bits instead. NFCI
llvm-svn: 315628
2017-10-12 21:58:25 +00:00
Eli Friedman 48e7549691 [DWARF] Fix bad comparator in sortGlobalExprs.
The comparator passed to std::sort must provide a strict weak ordering;
otherwise, the behavior is undefined.

Fixes an assertion failure generating debug info for globals
split by GlobalOpt. I have a testcase, but not sure how to reduce it,
so not included here.  (Someone else came up with a testcase, but I
can't reproduce the crash with it, presumably because my version of LLVM
ends up sorting the array differently.)

This isn't really a complete fix (see the FIXME in the patch), but at
least it doesn't have undefined behavior.

Differential Revision: https://reviews.llvm.org/D38830

llvm-svn: 315619
2017-10-12 20:54:08 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Diana Picus 4a5f522d4d MachineInstr: Make isEqual agree with getHashValue in MachineInstrExpressionTrait
MachineInstr::isIdenticalTo has a lot of logic for dealing with register
Defs (i.e. deciding whether to take them into account or ignore them).
This logic gets things wrong in some obscure cases, for instance if an
operand is not a Def for both the current MI and the one we are
comparing to.

I'm not sure if it's possible for this to happen for regular register
operands, but it may happen in the ARM backend for special operands
which use sentinel values for the register (i.e. 0, which is neither a
physical register nor a virtual one).

This causes MachineInstrExpressionTrait::isEqual (which uses
MachineInstr::isIdenticalTo) to return true for the following
instructions, which are the same except for the fact that one sets the
flags and the other one doesn't:
%1114 = ADDrsi %1113, %216, 17, 14, _, def _
%1115 = ADDrsi %1113, %216, 17, 14, _, _

OTOH, MachineInstrExpressionTrait::getHashValue returns different values
for the 2 instructions due to the different isDef on the last operand.
In practice this means that when trying to add those instructions to a
DenseMap, they will be considered different because of their different
hash values, but when growing the map we might get an assertion while
copying from the old buckets to the new buckets because isEqual
misleadingly returns true.

This patch makes sure that isEqual and getHashValue agree, by improving
the checks in MachineInstr::isIdenticalTo when we are ignoring virtual
register definitions (which is what the Trait uses). Firstly, instead of
checking isPhysicalRegister, we use !isVirtualRegister, so that we cover
both physical registers and sentinel values. Secondly, instead of
checking MachineOperand::isReg, we use MachineOperand::isIdenticalTo,
which checks isReg, isSubReg and isDef, which are the same values that
the hash function uses to compute the hash.

Note that the function is symmetric with this change, since if the
current operand is not a Def, we check MachineOperand::isIdenticalTo,
which returns false if the operands have different isDef's.

Differential Revision: https://reviews.llvm.org/D38789

llvm-svn: 315579
2017-10-12 13:59:51 +00:00
Daniel Jasper 4d93120273 Reinstantiate old/bad deduplication logic that was removed in r315279.
While this shouldn't be necessary anymore, we have cases where we run
into the assertion below, i.e. cases with two non-fragment entries for the
same variable at different frame indices.

This should be fixed, but for now, we should revert to a version that
does not trigger asserts.

llvm-svn: 315576
2017-10-12 13:25:05 +00:00
Hiroshi Inoue b49b015bed [ScheduleDAGInstrs] fix behavior of getUnderlyingObjectsForCodeGen when no identifiable object found
This patch fixes the bug introduced in https://reviews.llvm.org/D35907; the bug is reported by http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171002/491452.html.

Before D35907, when GetUnderlyingObjects fails to find an identifiable object, allMMOsOkay lambda in getUnderlyingObjectsForInstr returns false and Objects vector is cleared. This behavior is unintentionally changed by D35907.

This patch makes the behavior for such case same as the previous behavior.
Since D35907 introduced a wrapper function getUnderlyingObjectsForCodeGen around GetUnderlyingObjects, getUnderlyingObjectsForCodeGen is modified to return a boolean value to ask the caller to clear the Objects vector.

Differential Revision: https://reviews.llvm.org/D38735

llvm-svn: 315565
2017-10-12 06:26:04 +00:00
Mikael Holmen a079ef68e3 [RegisterCoalescer] Don't set read-undef in pruneValues, only clear
Summary:
The comments in the code said

 // Remove <def,read-undef> flags. This def is now a partial redef.

but the code didn't just remove read-undef, it could introduce new ones which
could cause errors.

E.g. if we have something like

%vreg1<def> = IMPLICIT_DEF
%vreg2:subreg1<def, read-undef> = op %vreg3, %vreg4
%vreg2:subreg2<def> = op %vreg6, %vreg7

and we merge %vreg1 and %vreg2 then we should not set undef on the second subreg
def, which the old code did.

Now we solve this by actually do what the code comment says. We remove
read-undef flags rather than remove or introduce them.

Reviewers: qcolombet, MatzeB

Reviewed By: MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38616

llvm-svn: 315564
2017-10-12 06:21:28 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Lang Hames 2241ffa43c [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315531
2017-10-11 23:34:47 +00:00
Reid Kleckner 9cdd4df81a [codeview] Implement FPO data assembler directives
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.

The directives are:
  .cv_fpo_proc _foo
  .cv_fpo_pushreg ebp/ebx/etc
  .cv_fpo_setframe ebp/esi/etc
  .cv_fpo_stackalloc 200
  .cv_fpo_endprologue
  .cv_fpo_endproc
  .cv_fpo_data _foo

I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.

I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28

Once we have cdb integration in debuginfo-tests, we can add integration
tests there.

Reviewers: majnemer, hans

Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D38776

llvm-svn: 315513
2017-10-11 21:24:33 +00:00
Florian Hahn e52abba277 [MachineCombiner] Fix initialisation of LastUpdate for incremental update.
Summary:
Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled.

Patch by Paul Walker.

Reviewers: fhahn, Gerolf, efriedma, MatzeB

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38734

llvm-svn: 315502
2017-10-11 20:25:58 +00:00
Vivek Pandya 9590658fb8 [NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure
parameterized emit() calls

Summary: This is not functional change to adopt new emit() API added in r313691.

Reviewed By: anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38285

llvm-svn: 315476
2017-10-11 17:12:59 +00:00
Krzysztof Parzyszek 12bdcab59c [Pipeliner] Fix offset value for instrs dependent on post-inc load/stores
The software pipeliner and the packetizer try to break dependence
between the post-increment instruction and the dependent memory
instructions by changing the base register and the offset value.
However, in some cases, the existing logic didn't work properly
and created incorrect offset value.

Patch by Jyotsna Verma.

llvm-svn: 315468
2017-10-11 15:59:51 +00:00
Krzysztof Parzyszek 8f174dde92 [Pipeliner] Improve serialization order for post-increments
The pipeliner is generating a serial sequence that causes poor
register allocation when a post-increment instruction appears
prior to the use of the post-increment register. This occurs when
there is a circular set of dependences involved with a sequence
of instructions in the same cycle. In this case, there is no
serialization of the parallel semantics that will not cause an
additional register to be allocated.

This patch fixes the problem by changing the instructions so that
the post-increment instruction is used by the subsequent
instruction, which enables the register allocator to make a
better decision and not require another register.

Patch by Brendon Cahoon.

llvm-svn: 315466
2017-10-11 15:51:44 +00:00
Sanjay Patel 34fd5eaaf0 [DAGCombiner] convert insertelement of bitcasted vector into shuffle
Eg:
insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7}

This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. 
We may want to abandon that one if we can't find value in squashing the more specific pattern sooner.

We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles.

There may be room for improvement in the shuffle lowering here, but that would be follow-up work.

Differential Revision: https://reviews.llvm.org/D38388

llvm-svn: 315460
2017-10-11 14:12:16 +00:00
Alex Bradbury 4d275f0dfe [TargetLowering] Correctly track NumFixedArgs field of CallLoweringInfo
The NumFixedArgs field of CallLoweringInfo is used by
TargetLowering::LowerCallTo to determine whether a given argument is passed
using the vararg calling convention or not (specifically, to set IsFixed for
each ISD::OutputArg).

Firstly, CallLoweringInfo::setLibCallee and CallLoweringInfo::setCallee both
incorrectly set NumFixedArgs based on the _previous_ args list. Secondly,
TargetLowering::LowerCallTo failed to increment NumFixedArgs when modifying
the argument list so a pointer is passed for the return value.

If your backend uses the IsFixed property or directly accesses NumFixedArgs, 
it is _possible_ this change could result in codegen changes (although the 
previous behaviour would have been incorrect). No such cases have been
identified during code review for any in-tree architecture.

Differential Revision: https://reviews.llvm.org/D37898

llvm-svn: 315457
2017-10-11 13:48:45 +00:00
Lang Hames 02d330548d [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315410
2017-10-11 01:57:21 +00:00
Justin Bogner fdf9bf4f16 CodeGen: Minor cleanups to use MachineInstr::getMF. NFC
Since r315388 we have a shorter way to say this, so we'll replace
MI->getParent()->getParent() with MI->getMF() in a few places.

llvm-svn: 315390
2017-10-10 23:50:49 +00:00
Justin Bogner ec7cba53e6 CodeGen: Add MachineInstr::getMF(). NFC
Similarly to how Instruction has getFunction, this adds a less verbose
way to write MI->getParent()->getParent(). I'll follow up shortly with
a change that changes a bunch of the uses.

llvm-svn: 315388
2017-10-10 23:34:01 +00:00
Eugene Zelenko 149178d92b [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 315380
2017-10-10 22:33:29 +00:00
Adrian Prantl 3a3ba77ba3 Convert condition to an early exit (NFC).
<rdar://problem/34689604>

llvm-svn: 315359
2017-10-10 20:33:43 +00:00
David Stuttard 51c1b22806 [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors
Summary:
See https://llvm.org/PR33743 for more details

It seems that for non-power of 2 vector sizes, the algorithm can produce
non-matching sizes for input and result causing an assert.

This usually isn't a problem as the isAnyExtend check will weed these out, but
in some cases (most often with lots of undefined values for the mask indices) it
can pass this check for non power of 2 vectors.

Adding in an extra check that ensures that bit size will match for the result
and input (as required)

Subscribers: nhaehnle

Differential Revision: https://reviews.llvm.org/D35241

llvm-svn: 315307
2017-10-10 12:45:45 +00:00
Bjorn Steinbrink d36bbe9c89 Ignore all duplicate frame index expression
Some passes might duplicate calls to llvm.dbg.declare creating
duplicate frame index expression which currently trigger an assertion
which is meant to catch erroneous, overlapping fragment declarations.
But identical frame index expressions are just redundant and don't
actually conflict with each other, so we can be more lenient and just
ignore the duplicates.

Reviewers: aprantl, rnk

Subscribers: llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D38540

llvm-svn: 315279
2017-10-10 07:46:17 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Aditya Nandakumar c3bfc81a1f [GISel]: Fix generation of illegal COPYs during CallLowering
We end up creating COPY's that are either truncating/extending and this
should be illegal.

https://reviews.llvm.org/D37640

Patch for X86 and ARM by igorb, rovka

llvm-svn: 315240
2017-10-09 20:07:43 +00:00
Sanjay Patel 2a61a821a0 [DAG] combine assertsexts around a trunc
This was a suggested follow-up to:
D37017 / https://reviews.llvm.org/rL313577

llvm-svn: 315206
2017-10-09 15:22:20 +00:00
Benjamin Kramer 16610028ea Remove unused variables. No functionality change.
llvm-svn: 315185
2017-10-08 19:11:02 +00:00
Craig Topper 90b76211d3 [SelectionDAG} Use KnownBits::isUnknown and hasConflict. NFC
llvm-svn: 315154
2017-10-07 17:07:48 +00:00
Jessica Paquette 13593843f6 [MachineOutliner] Disable outlining from LinkOnceODRs by default
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!

To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.

Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.

Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda

Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic

Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb

Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.

llvm-svn: 315136
2017-10-07 00:16:34 +00:00
Reid Kleckner 813c577cc2 [PEI] Remove required properties and use 'if' instead of std::function
Summary:
After r303360, we initialize UsesCalleeSaves in runOnMachineFunction,
which runs after getRequiredProperties. UsesCalleeSaves was initialized
to 'false', so getRequiredProperties would always return an empty set.
We don't have a TargetMachine available early anymore after r303360.
Just removing the requirement of NoVRegs seems to make things work, so
let's do that.

Reviewers: thegameg, dschuff, MatzeB

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D38597

llvm-svn: 315089
2017-10-06 18:21:19 +00:00
Xin Tong 27e66fb579 [MBP] Remove an invalid assert.
The patch that this assert comes with is fixing a bug in MBP. The assert is
invalid however.

Thanks to @sergey.k.okunev for finding this

Currently this fails SPECCPU2006 LTO. I will add a test case when I do more
investigation and have one.

llvm-svn: 315032
2017-10-05 23:00:04 +00:00
Karl-Johan Karlsson 8d8d201c17 [DebugInfo] Insert DEBUG_VALUEs after each register redefinition
Summary:
When reinserting debug values after register allocation, make sure to
insert debug values after each redefinition of debug value register in
the slot index range. The reason for this is that DwarfDebug will end
the range of a debug variable when the physical reg is defined. For
instructions with e.g. tied operands this result in prematurely ended
debug range.

This resolves pr34545

Patch by Karl-Johan Karlsson and Bjorn Pettersson

Reviewers: rnk, aprantl

Reviewed By: rnk

Subscribers: bjope, llvm-commits

Differential Revision: https://reviews.llvm.org/D38229

llvm-svn: 314974
2017-10-05 08:37:31 +00:00
Mikael Holmen 0ec1d25d33 Minor refactoring regarding Cast::isNoopCast(), NFC
Summary:
FastISel::hasTrivialKill() was the only user of the "IntPtrTy" version of
Cast::isNoopCast(). According to review comments in D37894 we could instead
use the "DataLayout" version of the method, and thus get rid of the
"IntPtrTy" versions of isNoopCast() completely.

With the above done, the remaining isNoopCast() could then be simplified
a bit more.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D38497

llvm-svn: 314969
2017-10-05 07:07:09 +00:00
Xin Tong d8d97972de [MachineBlockPlacement] Make sure PreferredLoopExit is cleared everytime new loop is processed
Summary: Rotate on exit that actually exits the current loop.

Reviewers: davidxl, danielcdh, iteratee, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38563

llvm-svn: 314937
2017-10-04 21:39:25 +00:00
Sanjay Patel 4c33d5213b [SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI
This is a follow-up to https://reviews.llvm.org/D38138. 

I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping 
any options by using default param values.

llvm-svn: 314930
2017-10-04 20:26:25 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Adam Nemet 6c381b7a2e [OptRemark] Move YAML writing to IR
Before the patch this was in Analysis.  Moving it to IR and making it implicit
part of LLVMContext::diagnose allows the full opt-remark facility to be used
outside passes e.g. the pass manager.  Jessica is planning to use this to
report function size after each pass.  The same could be used for time
reports.

Tested with BUILD_SHARED_LIBS=On.

llvm-svn: 314909
2017-10-04 15:18:11 +00:00
Adam Nemet f1bea0a6ab Also update MachineORE after r314874.
llvm-svn: 314908
2017-10-04 15:18:07 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Mikael Holmen a1a3f5c5e6 Recommit [UnreachableBlockElim] Use COPY if PHI input is undef
This time invoking llc with "-march=x86-64" in the testcase, so we don't assume
the default target is x86.

Summary:
If we have

    %vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
    %vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0

then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.

Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.

Reviewers: kparzysz

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38235

llvm-svn: 314882
2017-10-04 07:42:45 +00:00
Mikael Holmen 75b1992f78 Revert r314879 "[UnreachableBlockElim] Use COPY if PHI input is undef"
Build-bots broke on the new testcase. I'll investigate and fix.

llvm-svn: 314880
2017-10-04 06:39:22 +00:00
Mikael Holmen 65eb2f394c [UnreachableBlockElim] Use COPY if PHI input is undef
Summary:
If we have

    %vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
    %vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0

then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.

Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.

Reviewers: kparzysz

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38235

llvm-svn: 314879
2017-10-04 06:06:31 +00:00
Jessica Paquette acc15e1265 [MachineOutliner] Fix off-by-one in cost model
This commit does two things. Firstly, it cleans up some of the benefit
calculation wrt outlined functions and candidates. Secondly, it fixes an
off-by-one bug in the cost model which was caused by the benefit value of
an OutlinedFunction and Candidate differing by 1. It updates the remarks test
to reflect this change.

llvm-svn: 314836
2017-10-03 20:32:55 +00:00
Reid Kleckner b4569de739 Implement David Blaikie's suggestion for comparison operators
llvm-svn: 314822
2017-10-03 18:30:11 +00:00
Reid Kleckner 04e25e00b7 [DebugInfo] Correctly coalesce DBG_VALUEs that mix direct and indirect values
Summary:
This should fix a regression introduced by r313786, which switched from
MachineInstr::isIndirectDebugValue() to checking if operand 1 is an
immediate. I didn't have a test case for it until now.

A single UserValue, which approximates a user variable, may have many
DBG_VALUE instructions that disagree about whether the variable is in
memory or in a virtual register. This will become much more common once
we have llvm.dbg.addr, but you can construct such a test case manually
today with llvm.dbg.value.

Before this change, we would get two UserValues: one for direct and one
for indirect DBG_VALUE instructions describing the same variable. If we
build separate interval maps for direct and indirect locations, we will
end up accidentally coalescing identical DBG_VALUE intervals that need
to remain separate because they are broken up by intervals of the
opposite direct-ness.

Reviewers: aprantl

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37932

llvm-svn: 314819
2017-10-03 17:59:02 +00:00
Geoff Berry fabedbad11 Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This reverts commit r314729.

Another bug has been encountered in an out-of-tree target reported by Quentin.

llvm-svn: 314814
2017-10-03 16:59:13 +00:00
John Brawn 736bf0020f [CGP] Make optimizeMemoryInst capable of handling multiple AddrModes
Currently optimizeMemoryInst requires that all of the AddrModes it sees are
identical. This patch makes it capable of tracking multiple AddrModes, so long
as they differ in at most one field.

This patch does nothing by itself, but later patches will make use of it to
insert or reuse phi or select instructions for the differing fields.

Differential Revision: https://reviews.llvm.org/D38278

llvm-svn: 314795
2017-10-03 13:08:22 +00:00
John Brawn eb83c7554e [CGP] In optimizeMemoryInst handle select similarly to phi
This lets us optimize away selects that perform the same address computation in
two different ways and is also the first step towards being able to handle
selects between two different, but compatible, address computations.

Differential Revision: https://reviews.llvm.org/D38242

llvm-svn: 314794
2017-10-03 13:04:15 +00:00
Sam Clegg b2b019f727 [WebAssembly] MC: Support for init_array and fini_array
Differential Revision: https://reviews.llvm.org/D37757

llvm-svn: 314783
2017-10-03 11:20:28 +00:00
Bjorn Pettersson 90cc1b53d0 [DebugInfo] Handle endianness when moving debug info for split integer values (reapplied)
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.

This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.

I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.

Original patch was reverted due to using -stop-after=isel in
the test case (but that is only working when AMDGPU target
is included in the llc build). The test case has now been
updated to use -stop-before=expand-isel-pseudos instead.

Patch by: dstenb

Reviewers: JDevlieghere, aprantl, dblaikie

Reviewed By: JDevlieghere, aprantl, dblaikie

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D38172

llvm-svn: 314781
2017-10-03 11:03:02 +00:00
Javed Absar e485b143ea [MiSched] - Simplify ProcResEntry access
Reviewed by: @MatzeB
Differential Revision: https://reviews.llvm.org/D38447

llvm-svn: 314775
2017-10-03 09:35:04 +00:00
Sjoerd Meijer 7a22a4948f ISel type legalization: add debug messages. NFCI.
This adds some more debug messages to the type legalizer and functions
like PromoteNode, ExpandNode, ExpandLibCall in an attempt to make
the debug messages a little bit more informative and useful.

Differential Revision: https://reviews.llvm.org/D38450

llvm-svn: 314773
2017-10-03 08:54:15 +00:00
Quentin Colombet c2f3cea608 [Legalizer] Add support for G_OR NarrowScalar.
Legalize bitwise OR:
 A = BinOp<Ty> B, C
into:
 B1, ..., BN = G_UNMERGE_VALUES B
 C1, ..., CN = G_UNMERGE_VALUES C
 A1 = BinOp<Ty/N> B1, C2
 ...
 AN = BinOp<Ty/N> BN, CN
 A = G_MERGE_VALUES A1, ..., AN

llvm-svn: 314760
2017-10-03 04:53:56 +00:00
Tim Shen 59465d29f8 [PowerPC] Revert r314666.
See https://reviews.llvm.org/D38172.

I tried to XFAIL it, but sometimes XPASS triggers the bot. Simply
revert it.

llvm-svn: 314739
2017-10-02 23:20:06 +00:00
Geoff Berry bfc5fb4571 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
  in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
  explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 314729
2017-10-02 22:01:37 +00:00
Michael Liao 70356574bb Remove trailing whitespace to trigger re-cmaking
llvm-svn: 314728
2017-10-02 21:54:38 +00:00
Stanislav Mekhanoshin 6deb212522 Eliminate ftrunc if source is know to be rounded
Differential Revision: https://reviews.llvm.org/D38421

llvm-svn: 314688
2017-10-02 16:57:07 +00:00
Sanjay Patel 232669d6b3 use range-for-loops; NFCI
llvm-svn: 314676
2017-10-02 15:02:06 +00:00
Sanjay Patel 7998d37076 remove duplicate comments, reposition related functions; NFC
llvm-svn: 314669
2017-10-02 14:03:17 +00:00
Bjorn Pettersson 839775a277 [Debug info] Handle endianness when moving debug info for split integer values
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.

This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.

I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.

Patch by: dstenb

Reviewers: JDevlieghere, aprantl, dblaikie

Reviewed By: JDevlieghere, aprantl, dblaikie

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D38172

llvm-svn: 314666
2017-10-02 12:46:32 +00:00
Yaxun Liu b33607e5a1 CodeGen: Fix pointer info in expandUnalignedLoad/Store
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.

This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.

Differential Revision: https://reviews.llvm.org/D35361

llvm-svn: 314566
2017-09-29 23:31:14 +00:00
Eugene Zelenko 4f81cdd818 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314559
2017-09-29 21:55:49 +00:00
Jonas Paulsson c9e363ac69 [SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.

If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).

This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899
https://bugs.llvm.org/show_bug.cgi?id=34610

llvm-svn: 314516
2017-09-29 14:31:39 +00:00
Jessica Paquette 919991690c [MachineOutliner][NFC] Simplify logic in pruneCandidates
This commit yanks out the repeated sections of code in pruneCandidates into
two lambdas: ShouldSkipCandidate and Prune. This simplifies the logic in
pruneCandidates significantly, and reduces the chance of introducing bugs by
folding all of the shared logic into one place.

llvm-svn: 314475
2017-09-28 23:39:36 +00:00
Matthias Braun 5c3e8a450e MIR: Serialize CaleeSavedInfo Restored flag
llvm-svn: 314449
2017-09-28 18:52:14 +00:00
Adrian Prantl 99fdb9d927 llvm-dwarfdump: implement --find for .apple_names
This patch implements the dwarfdump option --find=<name>.  This option
looks for a DIE in the accelerator tables and dumps it if found.  This
initial patch only adds support for .apple_names to keep the review
small, adding the other sections and pubnames support should be
trivial though.

Differential Revision: https://reviews.llvm.org/D38282

llvm-svn: 314439
2017-09-28 18:10:52 +00:00
Bjorn Pettersson 715a5efaad [DebugInfo] Do not extend range for physreg in LiveDebugVariables
Summary:
A DBG_VALUE that is referring to a physical register is
valid up until the next def of the register, or the end
of the basic block that it belongs to.

LiveDebugVariables is computing live intervals (slot index
ranges) for DBG_VALUE instructions, before regalloc, in order
to be able to re-insert DBG_VALUE instructions again after
regalloc. When the DBG_VALUE is mapping a variable to a
physical register we do not need to compute the range. We
should simply re-insert the DBG_VALUE at the start position.

The problem that was found, resulting in this patch, was a
situation when the DBG_VALUE was the last real use of the
physical register. The computeIntervals/extendDef methods
extended the range to cover the whole basic block, even though
the physical register very well could be allocated to some
virtual register inside the basic block. So the extended
range could not be trusted.

This patch is a preparation for https://reviews.llvm.org/D38229,
where the goal is to insert DBG_VALUE after each new definition
of a variable, even if the virtual registers that the variable
was connected to has been coalesced into using the same physical
register (e.g. due to two address instructions). For more info
see https://bugs.llvm.org/show_bug.cgi?id=34545

Reviewers: aprantl, rnk, echristo

Reviewed By: aprantl

Subscribers: Ka-Ka, llvm-commits

Differential Revision: https://reviews.llvm.org/D38140

llvm-svn: 314414
2017-09-28 13:10:06 +00:00
Alex Bradbury 5518cbfc41 Teach TargetInstrInfo::getInlineAsmLength to parse .space directives with integer arguments
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.

This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.

Differential Revision: https://reviews.llvm.org/D37798

llvm-svn: 314393
2017-09-28 09:31:46 +00:00
Mikael Holmen 07f1e2e2b3 [RegAllocGreedy]: Allow recoloring of done register if it's non-tied
Summary:
If we have a non-allocated register, we allow us to try recoloring of an
already allocated and "Done" register, even if they are of the same
register class, if the non-allocated register has at least one tied def
and the allocated one has none.

It should be easier to recolor the non-tied register than the tied one, so
it might be an improvement even if they use the same regclasses.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D38309

llvm-svn: 314388
2017-09-28 08:22:35 +00:00
George Burgess IV f8e11b803d [DAGCombiner] Fix an off-by-one error in vector logic
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.

Differential Revision: https://reviews.llvm.org/D37880

llvm-svn: 314380
2017-09-28 06:17:19 +00:00
Eugene Zelenko fa57bd0ced [CodeGen] Fix some Clang-tidy modernize-use-default-member-init and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314363
2017-09-27 23:26:01 +00:00
Don Hinton 53eb637115 Cleanup some problems with LLVM_ENABLE_DUMP in release builds, and
always set LLVM_ENABLE_DUMP=ON for +Asserts builds.

Differential Revision: https://reviews.llvm.org/D38306

llvm-svn: 314346
2017-09-27 21:19:56 +00:00
Jessica Paquette 4cf187b5b4 [MachineOutliner] AArch64: Avoid saving + restoring LR if possible
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.

This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.

This also improves several comments on the outliner's cost model.

https://reviews.llvm.org/D36721

llvm-svn: 314341
2017-09-27 20:47:39 +00:00
Than McIntosh dee2cf67ea [CodeGen] Emit necessary .note sections for -fsplit-stack
Summary:
According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function.

This change adds the sections if needed.

Fixes PR #34670.

Reviewers: thanm, rnk, luqmana

Reviewed By: rnk

Subscribers: llvm-commits

Patch by Cherry Zhang <cherryyz@google.com>

Differential Revision: https://reviews.llvm.org/D38051

llvm-svn: 314335
2017-09-27 19:34:00 +00:00
Sanjay Patel 0f9b4773c1 [SimplifyCFG] add a struct to house optional folds (PR34603)
This was intended to be no-functional-change, but it's not - there's a test diff.

So I thought I should stop here and post it as-is to see if this looks like what was expected 
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Notes:
 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried 
    through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. 
    The parameter isn't passed down, so we pick up the default value from the function signature 
    after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 
    'SimplifyCFG' calls.

 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. 
    This would theoretically allow us to differentiate the transforms controlled by those params 
    independently.

 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. 
    I just stopped here to minimize the diffs.

 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that 
    could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG 
    set to true no matter where in the pipeline it's creating SimplifyCFG passes?

    // Create an early function pass manager to cleanup the output of the
    // frontend.
    EarlyFPM.addPass(SimplifyCFGPass());

    -->

    /// \brief Construct a pass with the default thresholds
    /// and switch optimizations.
    SimplifyCFGPass::SimplifyCFGPass()
       : BonusInstThreshold(UserBonusInstThreshold),
         LateSimplifyCFG(true) {}   <-- switches get converted to lookup tables and loops may not be in canonical form

    If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' 
    setting via recursion was masking this bug.

Differential Revision: https://reviews.llvm.org/D38138

llvm-svn: 314308
2017-09-27 14:54:16 +00:00
Mikael Holmen 3bcc9f0c1f [RegAllocGreedy] Fix spelling error, "inteference" -> "interference", NFC
llvm-svn: 314299
2017-09-27 11:27:50 +00:00
Javed Absar 1a77bcc0d2 [Misched]: Remove double call getMicroOpFactor.NFC.
Reviewed by: @MatzeB
Differential Revision: https://reviews.llvm.org/D38176

llvm-svn: 314296
2017-09-27 10:31:58 +00:00
Craig Topper 1c781338ee [SelectionDAG] Make NewSDValueDbgMsg print target specific nodes correctly by passing in the SelectionDAG.
llvm-svn: 314271
2017-09-27 05:17:14 +00:00
Craig Topper 5bc10ede53 [SelectionDAG] Teach simplifyDemandedBits to handle shifts by constant splat vectors
This teach simplifyDemandedBits to handle constant splat vector shifts.

This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.

I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.

I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.

Differential Revision: https://reviews.llvm.org/D37665

llvm-svn: 314139
2017-09-25 19:26:08 +00:00
Reid Kleckner 8898cd8dcf [DebugInfo] Sort the SDDbgValue list before assuming it is in IR order
Summary:
This code iterates the 'Orders' vector in parallel with the DbgValue
list, emitting all DBG_VALUEs that occurred between the last IR order
insertion point and the next insertion point. This assumes the
SDDbgValue list is sorted in IR order, which it usually is. However, it
is not sorted when a node with a debug value is replaced with another
one. When this happens, TransferDbgValues is called, and the new value
is added to the end of the list.

The problem can be solved by stably sorting the list by IR order.

Reviewers: aprantl, Ka-Ka

Reviewed By: aprantl

Subscribers: MatzeB, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D38197

llvm-svn: 314114
2017-09-25 16:14:53 +00:00
Reid Kleckner 09e75c9399 Use {} instead of make_pair and an iterator for the insertion point, NFC
llvm-svn: 314113
2017-09-25 16:14:39 +00:00
Clement Courbet 2807c0a442 [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.

Remove the unused Instruction* parameter.

Differential Revision: https://reviews.llvm.org/D38165

llvm-svn: 314096
2017-09-25 06:35:16 +00:00
Eugene Zelenko 8e30a1c607 [CodeGen] Fix build bots which uses old Clang broken in r314046. (NFC)
llvm-svn: 314049
2017-09-22 23:55:32 +00:00
Eugene Zelenko f193332994 [CodeGen] Fix some Clang-tidy modernize-use-default-member-init and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314046
2017-09-22 23:46:57 +00:00
Tim Shen cee7536188 [XRay] support conditional return on PPC.
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.

Reviewers: dberris, echristo

Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D38102

llvm-svn: 314005
2017-09-22 18:30:02 +00:00
Eugene Zelenko fb7f792f55 [CodeGen] Fix some Clang-tidy modernize-use-bool-literals and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313941
2017-09-21 23:20:16 +00:00
Craig Topper de4379251e [DAGCombiner] Slightly simplify some code by using APInt::isMask() and countTrailingOnes instead of getting active bits and checking if all the bits below that make a mask.
At least for the 64-bit and less case, we should be able to determine if we even have a mask without counting any bits. This also removes the need to explicitly check for 0 active bits, isMask will return false for 0.

llvm-svn: 313908
2017-09-21 20:12:19 +00:00
Reid Kleckner 0fe506bc5e Re-land r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
     Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+    if (DII == InsertBefore)
+      InsertBefore = &*std::next(InsertBefore->getIterator());
     DII->eraseFromParent();

I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.

The reduced C test case for this was:
  void useit(int*);
  static inline void inlineme() {
    int x[2];
    useit(x);
  }
  void f() {
    inlineme();
    inlineme();
  }

llvm-svn: 313905
2017-09-21 19:52:03 +00:00
Bjorn Pettersson 0dde08c3cb [SelectionDAG] Pick correct frame index in LowerArguments
Summary:
SelectionDAGISel::LowerArguments is associating arguments
with frame indices (FuncInfo->setArgumentFrameIndex). That
information is later on used by EmitFuncArgumentDbgValue to
create DBG_VALUE instructions that denotes that a variable
can be found on the stack.

I discovered that for our (big endian) out-of-tree target
the association created by SelectionDAGISel::LowerArguments
sometimes is wrong. I've seen this happen when a 64-bit value
is passed on the stack. The argument will occupy two stack
slots (frame index X, and frame index X+1). The fault is
that a call to setArgumentFrameIndex is associating the
64-bit argument with frame index X+1. The effect is that the
debug information (DBG_VALUE) will point at the least significant
part of the arguement on the stack. When printing the
argument in a debugger I will get the wrong value.

I managed to create a test case for PowerPC that seems to
show the same kind of problem.

The bugfix will look at the datalayout, taking endianness into
account when examining a BUILD_PAIR node, assuming that the
least significant part is in the first operand of the BUILD_PAIR.
For big endian targets we should use the frame index from
the second operand, as the most significant part will be stored
at the lower address (using the highest frame index).

Reviewers: bogner, rnk, hfinkel, sdardis, aprantl

Reviewed By: aprantl

Subscribers: nemanjai, aprantl, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37740

llvm-svn: 313901
2017-09-21 18:52:08 +00:00
Craig Topper 280f133773 [DAGCombiner] Remove duplicate code from visitZERO_EXTEND
This exact block of code exists right below.

Differential Revision: https://reviews.llvm.org/D38122

llvm-svn: 313891
2017-09-21 17:30:02 +00:00
Daniel Jasper 7d2f38d600 Revert r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
.. as well as the two subsequent changes r313826 and r313875.

This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).

llvm-svn: 313876
2017-09-21 12:07:33 +00:00
Strahinja Petrovic 29202f6dc1 Fixed reverted commit rL312318
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.

Patch by: Nikola Prica.

llvm-svn: 313870
2017-09-21 10:04:02 +00:00
Craig Topper f0ba300332 [SelectionDAG] Replace a flag that can never be true with an assert.
llvm-svn: 313847
2017-09-21 00:18:46 +00:00
Craig Topper eb0f71f232 [SelectionDAG] Use APInt::getActivebits instead of Bitwidth - leading zeros.
llvm-svn: 313839
2017-09-20 23:48:56 +00:00
Reid Kleckner 3f547e87b2 [IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
  http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html

This is tracked as PR34136

llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.

The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.

Reviewers: aprantl, dblaikie, probinson

Subscribers: eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37768

llvm-svn: 313825
2017-09-20 21:52:33 +00:00
Saleem Abdulrasool 432b88e5f4 CodeGen: support SwiftError SwiftCC on Windows x64
Add support for passing SwiftError through a register on the Windows x64
calling convention.  This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter.  This partially enables building the swift standard library
for Windows x86_64.

llvm-svn: 313791
2017-09-20 18:40:59 +00:00
Reid Kleckner 4e04028791 Re-land "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
After r313775, it's easier to maintain a parallel BitVector of spilled
locations indexed by location number.

I wasn't able to build a good reduced test case for this iteration of
the bug, but I added a more direct assertion that spilled values must
use frame index locations. If this bug reappears, it won't only fire on
the NEON vector code that we detected it on, but on medium-sized
integer-only programs as well.

llvm-svn: 313786
2017-09-20 18:19:08 +00:00
Reid Kleckner 92687d45db [DebugInfo] Use a MapVector to coalesce MachineOperand locations
Summary:
The new code should be linear in the number of DBG_VALUEs, while the old
code was quadratic. NFC intended.

This is also hopefully a more direct expression of the problem, which is
to:

1. Rewrite all virtual register operands to stack slots or physical
   registers
2. Uniquely number those machine operands, assigning them location
   numbers
3. Rewrite all uses of the old location numbers in the interval map to
   use the new location numbers

In r313400, I attempted to track which locations were spilled in a
parallel bitvector indexed by location number. My code was broken
because these location numbers are not stable during rewriting.

Reviewers: aprantl, hans

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D38068

llvm-svn: 313775
2017-09-20 17:32:54 +00:00
Florian Hahn ceb4494786 Recommit [MachineCombiner] Update instruction depths incrementally for large BBs.
This version of the patch fixes an off-by-one error causing PR34596. We
do not need to use std::next(BlockIter) when calling updateDepths, as
BlockIter already points to the next element.

Original commit message:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.

> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313751
2017-09-20 11:54:37 +00:00
Quentin Colombet d652aeb144 [MIRPrinter] Print empty successor lists when they cannot be guessed
This re-applies commit r313685, this time with the proper updates to
the test cases.

Original commit message:
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).

For instance, the following test case used to fail the verifier.
The MIR printer would print

         entry
        /      \
   true (def)   false (no list of successors)
       |
 split.true (use)

The MIR parser would understand this:

         entry
        /      \
   true (def)   false
       |        /  <-- invalid edge
 split.true (use)

Because of the invalid edge, we get the "def does not
dominate all uses" error.

The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.

rdar://problem/34022159

llvm-svn: 313696
2017-09-19 23:34:12 +00:00
Saleem Abdulrasool 399a4e9b0b CodeGen: use range based for loops (NFC)
Simplify the RPOT traversal by using a range based for loop for the
iterator dereference.

llvm-svn: 313687
2017-09-19 22:10:20 +00:00
Quentin Colombet 6888dbcda7 Revert "[MIRPrinter] Print empty successor lists when they cannot be guessed"
This reverts commit r313685.

I thought I had ran ninja check, but apparently I didn't...
Need to update a bunch of mir tests.

llvm-svn: 313686
2017-09-19 22:03:50 +00:00
Quentin Colombet 7fdaa5e641 [MIRPrinter] Print empty successor lists when they cannot be guessed
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).

For instance, the following test case used to fail the verifier.
The MIR printer would print
          entry
         /      \
    true (def)   false (no list of successors)
        |
  split.true (use)

The MIR parser would understand this:
          entry
         /      \
    true (def)   false
        |        /  <-- invalid edge
  split.true (use)

Because of the invalid edge, we get the "def does not
dominate all uses" error.

The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.

rdar://problem/34022159

llvm-svn: 313685
2017-09-19 21:55:51 +00:00
Reid Kleckner ffdf087499 Revert "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
This reverts r313640, originally r313400, one more time for essentially
the same issue. My BitVector of spilled location numbers isn't working
because we coalesce identical DBG_VALUE locations as we rewrite them,
invalidating the location numbers used to index the BitVector.

llvm-svn: 313679
2017-09-19 21:18:32 +00:00
Reid Kleckner 26fa1bf4da Re-land "Fix Bug 30978 by emitting cv file checksums."
This reverts r313431 and brings back r313374 with a fix to write
checksums as binary data and not ASCII hex strings.

llvm-svn: 313657
2017-09-19 18:14:45 +00:00
Reid Kleckner 86c74dd791 Re-land r313400 "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
I forgot to zero out the BitVector when reusing it between UserValues.

Later uses of the same location number for a different UserValue would
falsely indicate that they were spilled. Usually this would lead to
incorrect debug info, but in some cases they would indicate something
nonsensical like a memory location based on a vector register (Q8 on
ARM).

llvm-svn: 313640
2017-09-19 16:32:15 +00:00
Hans Wennborg 4de620ab3d Revert r313400 "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
This caused asserts in Chromium. See http://crbug.com/766261

> Summary:
> This comes up in optimized debug info for C++ programs that pass and
> return objects indirectly by address. In these programs,
> llvm.dbg.declare survives optimization, which causes us to emit indirect
> DBG_VALUE instructions. The fast register allocator knows to insert
> DW_OP_deref when spilling indirect DBG_VALUE instructions, but the
> LiveDebugVariables did not until this change.
>
> This fixes part of PR34513. I need to look into why this doesn't work at
> -O0 and I'll send follow up patches to handle that.
>
> Reviewers: aprantl, dblaikie, probinson
>
> Subscribers: qcolombet, hiraditya, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D37911

llvm-svn: 313589
2017-09-18 23:08:42 +00:00
Sanjay Patel f31b1a00ea [DAGCombiner] fold assertzexts separated by trunc
If we have an AssertZext of a truncated value that has already been AssertZext'ed, 
we can assert on the wider source op to improve the zext-y knowledge:
 assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN

This moves a fold from being Mips-specific to general combining, and x86 shows
improvements.

Differential Revision: https://reviews.llvm.org/D37017

llvm-svn: 313577
2017-09-18 22:05:35 +00:00
Sanjay Patel 7765c93be2 [DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217

We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.

I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.

Differential Revision: https://reviews.llvm.org/D37987

llvm-svn: 313564
2017-09-18 20:54:26 +00:00
Ahmed Bougacha d630a92b0e [GlobalISel] Only build expensive remarks if they're enabled. NFC.
r313390 taught 'allowExtraAnalysis' to check whether remarks are
enabled at all.  Use that to only do the expensive instruction printing
if they are.

llvm-svn: 313552
2017-09-18 18:50:09 +00:00
Simon Pilgrim 0b21ef1fa3 [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.

We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.

Differential Revision: https://reviews.llvm.org/D37849

llvm-svn: 313543
2017-09-18 16:45:05 +00:00
Eric Beckmann 913213c8ae Revert "Fix Bug 30978 by emitting cv file checksums."
This reverts commit 6389e7aa724ea7671d096f4770f016c3d86b0d54.

There is a bug in this implementation where the string value of the
checksum is outputted, instead of the actual hex bytes.  Therefore the
checksum is incorrect, and this prevent pdbs from being loaded by visual
studio.  Revert this until the checksum is emitted correctly.

llvm-svn: 313431
2017-09-16 01:14:36 +00:00
Reid Kleckner eed0973270 Name the sentinel value used for the location number of the undefined register NFC
llvm-svn: 313405
2017-09-15 22:08:50 +00:00
Reid Kleckner 3a66c1cb58 [DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs
Summary:
This comes up in optimized debug info for C++ programs that pass and
return objects indirectly by address. In these programs,
llvm.dbg.declare survives optimization, which causes us to emit indirect
DBG_VALUE instructions. The fast register allocator knows to insert
DW_OP_deref when spilling indirect DBG_VALUE instructions, but the
LiveDebugVariables did not until this change.

This fixes part of PR34513. I need to look into why this doesn't work at
-O0 and I'll send follow up patches to handle that.

Reviewers: aprantl, dblaikie, probinson

Subscribers: qcolombet, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37911

llvm-svn: 313400
2017-09-15 21:54:38 +00:00
Reid Kleckner 9e6c309ef3 [DebugInfo] Add missing DW_OP_deref when an NRVO pointer is spilled
Summary:
Fixes PR34513.

Indirect DBG_VALUEs typically come from dbg.declares of non-trivially
copyable C++ objects that must be passed by address. We were already
handling the case where the virtual register gets allocated to a
physical register and is later spilled. That's what usually happens for
normal parameters that aren't NRVO variables: they usually appear in
physical register parameters, and are spilled later in the function,
which would correctly add deref.

NRVO variables are different because the dbg.declare can come much later
after earlier instructions cause the incoming virtual register to be
spilled.

Also, clean up this code. We only need to look at the first operand of a
DBG_VALUE, which eliminates the operand loop.

Reviewers: aprantl, dblaikie, probinson

Subscribers: MatzeB, qcolombet, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37929

llvm-svn: 313399
2017-09-15 21:49:56 +00:00
Sam Clegg 759631c77b [WebAssembly] MC: Create wasm data segments based on MCSections
This means that we can honor -fdata-sections rather than
always creating a segment for each symbol.

It also allows for a followup change to add .init_array and friends.

Differential Revision: https://reviews.llvm.org/D37876

llvm-svn: 313395
2017-09-15 20:54:59 +00:00
Sam Clegg 66a99e41cd Change encodeU/SLEB128 to pad to certain number of bytes
Previously the 'Padding' argument was the number of padding
bytes to add. However most callers that use 'Padding' know
how many overall bytes they need to write.  With the previous
code this would mean encoding the LEB once to find out how
many bytes it would occupy and then using this to calulate
the 'Padding' value.

See: https://reviews.llvm.org/D36595

Differential Revision: https://reviews.llvm.org/D37494

llvm-svn: 313393
2017-09-15 20:34:47 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Eric Beckmann 349746f044 Fix Bug 30978 by emitting cv file checksums.
Summary:
The checksums had already been placed in the IR, this patch allows
MCCodeView to actually write it out to an MCStreamer.

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37157

llvm-svn: 313374
2017-09-15 18:20:28 +00:00
Jonas Paulsson 6188f326eb Recommit "[RegAlloc] Make sure live-ranges reflect the state of the IR when
removing them"

This was temporarily reverted, but now that the fix has been commited (r313197)
it should be put back in place.

https://bugs.llvm.org/show_bug.cgi?id=34502

This reverts commit 9ef93d9dc4c51568e858cf8203cd2c5ce8dca796.

llvm-svn: 313349
2017-09-15 07:47:38 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Reid Kleckner 87288b98b6 [codeview] Use a type index of zero for static method "this" types
Otherwise VS won't show anything in the autos or watch window of static
methods.

llvm-svn: 313329
2017-09-15 00:59:07 +00:00
Jan Sjodin 312ccf761c Add AddresSpace to PseudoSourceValue.
Differential Revision: https://reviews.llvm.org/D35089

llvm-svn: 313297
2017-09-14 20:53:51 +00:00
Benjamin Kramer 591aac7cdf Remove usages of deprecated std::unary_function and std::binary_function.
These are removed in C++17. We still have some users of
unary_function::argument_type, so just spell that typedef out. No
functionality change intended.

Note that many of the argument types are actually wrong :)

llvm-svn: 313287
2017-09-14 18:33:25 +00:00
Krzysztof Parzyszek 779d98e1c0 TableGen support for parameterized register class information
This replaces TableGen's type inference to operate on parameterized
types instead of MVTs, and as a consequence, some interfaces have
changed:
- Uses of MVTs are replaced by ValueTypeByHwMode.
- EEVT::TypeSet is replaced by TypeSetByHwMode.

This affects the way that types and type sets are printed, and the
tests relying on that have been updated.

There are certain users of the inferred types outside of TableGen
itself, namely FastISel and GlobalISel. For those users, the way
that the types are accessed have changed. For typical scenarios,
these replacements can be used:
- TreePatternNode::getType(ResNo) -> getSimpleType(ResNo)
- TreePatternNode::hasTypeSet(ResNo) -> hasConcreteType(ResNo)
- TypeSet::isConcrete -> TypeSetByHwMode::isValueTypeByHwMode(false)

For more information, please refer to the review page.

Differential Revision: https://reviews.llvm.org/D31951

llvm-svn: 313271
2017-09-14 16:56:21 +00:00
Krzysztof Parzyszek 6ca02b25a7 [IfConversion] More simple, correct dead/kill liveness handling
Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D37611

llvm-svn: 313268
2017-09-14 15:53:11 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Simon Pilgrim 523483e0bd [SelectionDAG] ComputeNumSignBits - cleanup ROTL/ROTR wrapping to match DAGCombine etc.
Use RotAmt.urem(VTBits) instead of AND(RotAmt, VTBits - 1)

TBH I don't expect non-power-of-2 types to be created, but it makes the logic clearer and matches what we do in other rotation combines.

llvm-svn: 313245
2017-09-14 10:28:01 +00:00
Dean Michael Berris 01fd7c8bd4 [XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map
Summary:
XRay had been assuming that the previous section is the "text" section
of the function when lowering the instrumentation map. Unfortunately
this is not a safe assumption, because we may be coming from lowering
debug type information for the function being lowered.

This fixes an issue with combining -gsplit-dwarf, -generate-type-units,
-debug-compile and -fxray-instrument for sole member functions. When the
split dwarf section is stripped, we're left with references from the
xray_instr_map to the debug section. The change now uses the function's
symbol instead of the previous section's start symbol.

We found the bug while attempting to strip the split debug sections off
an XRay-instrumented object file, which had a peculiar edge-case for
single-function classes where the single function is being lowered.
Because XRay had assocaited the instrumentation map for a function to
the debug types section instead of the function's section, the objcopy
call will fail due to the misplaced reference from the xray_instr_map
section.

Reviewers: pcc, dblaikie, echristo

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D37791

llvm-svn: 313233
2017-09-14 07:08:23 +00:00
Reid Kleckner cd7bba0264 [codeview] Fold FIXME into comment, there's nothing to do. NFC
llvm-svn: 313214
2017-09-13 23:30:01 +00:00
Hans Wennborg 06e2a384c2 Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.

> [MachineCombiner] Update instruction depths incrementally for large BBs.
>
> Summary:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
>
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313213
2017-09-13 23:23:09 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Reid Kleckner 89af112cf5 [codeview] VLAs and unsized arrays should use a size of zero
Previously we used a size of '1' for VLAs because we weren't sure what
MSVC did. However, MSVC does support declaring an array without a size,
for which it emits an array type with a size of zero. Clang emits the
same DI metadata for VLAs and arrays without bound, so we would describe
arrays without bound as having one element. This lead to Microsoft
debuggers only printing a single element.

Emitting a size of zero appears to cause these debuggers to search the
symbol information to find a definition of the variable with accurate
array bounds.

Fixes http://crbug.com/763580

llvm-svn: 313203
2017-09-13 21:54:20 +00:00
Wei Mi c0d066468e [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
This is to fix PR34502. After rL311401, the live range of spilled vreg will be
cleared. HoistSpill need to use the live range of the original vreg before splitting
to know the moving range of the spills. The patch saves a copy of live interval for
the spilled vreg inside of HoistSpillHelper.

Differential Revision: https://reviews.llvm.org/D37578

llvm-svn: 313197
2017-09-13 21:41:30 +00:00
Eugene Zelenko 618c555bbe [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313194
2017-09-13 21:15:20 +00:00
Adrian McCarthy d91bf3998f Mark static member functions as static in CodeViewDebug
Summary:
To improve CodeView quality for static member functions, we need to make the
static explicit.  In addition to a small change in LLVM's CodeViewDebug to
return the appropriate MethodKind, this requires a small change in Clang to
note the staticness in the debug info metadata.

Subscribers: aprantl, hiraditya

Differential Revision: https://reviews.llvm.org/D37715

llvm-svn: 313192
2017-09-13 20:53:55 +00:00
Mikael Holmen 4eb2a96e7f [MachineScheduler] Put SchedRegion in an anonymous namespace.
Summary: It pollutes the global namespace otherwise.

Patch by: Bevin Hansson

Reviewers: jonpa

Reviewed By: jonpa

Subscribers: MatzeB, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37555

llvm-svn: 313148
2017-09-13 14:07:47 +00:00
Peter Collingbourne 876da0294a Remove -generate-dwarf-pub-sections flag.
This flag is unnecessary for testing because we can get the coverage
we need by adjusting CU attributes.

Differential Revision: https://reviews.llvm.org/D37725

llvm-svn: 313079
2017-09-12 21:50:55 +00:00
Peter Collingbourne b52e23669c IR: Represent -ggnu-pubnames with a flag on the DICompileUnit.
This allows the flag to be persisted through to LTO.

Differential Revision: https://reviews.llvm.org/D37655

llvm-svn: 313078
2017-09-12 21:50:41 +00:00
Lei Huang 34e6621724 Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 313061
2017-09-12 18:39:11 +00:00
Sam Clegg 2176a9f2a3 [WebAssembly] Remove flags from MCSectionWasm
Looks like these were copied from the ELF sections but
don't apply to Wasm and were not used anywhere.

Also remove unused Wasm methods in MCContext.

Differential Revision: https://reviews.llvm.org/D37633

llvm-svn: 313058
2017-09-12 18:31:24 +00:00
Robert Lougher 51529eb0c2 Revert "[DWARF] Incorrect prologue end line record."
This reverts commit r313047 as it is causing buildbot failure (lldb inline
stepping tests).

llvm-svn: 313057
2017-09-12 18:23:15 +00:00
Robert Lougher f696a22d3c [DWARF] Incorrect prologue end line record.
A prologue-end line record is emitted with an incorrect associated address,
which causes a debugger to show the beginning of function body to be inside
the prologue.

Patch written by Carlos Alberto Enciso.

Differential Revision: https://reviews.llvm.org/D37625

llvm-svn: 313047
2017-09-12 16:35:25 +00:00
Eugene Zelenko 32a4056438 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 312971
2017-09-11 23:00:48 +00:00
Hiroshi Yamauchi 9364432cec Unmerge GEPs to reduce register pressure on IndirectBr edges.
Summary:
GEP merging can sometimes increase the number of live values and register
pressure across control edges and cause performance problems particularly if the
increased register pressure results in spills.

This change implements GEP unmerging around an IndirectBr in certain cases to
mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP
merging has happened.)

With this patch, the Python interpreter loop runs faster by ~5%.

Reviewers: sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: eastig, junbuml, llvm-commits

Differential Revision: https://reviews.llvm.org/D36772

llvm-svn: 312930
2017-09-11 17:52:08 +00:00
Craig Topper 8dff57a0ed [SelectionDAG] Remove a check for type being a vector type after calling getShiftAmountTy. NFCI
getShiftAmountTy already returns the vector type when called for vectors.

llvm-svn: 312924
2017-09-11 16:15:39 +00:00
Matt Arsenault eb4474cf20 Fix typo
llvm-svn: 312919
2017-09-11 15:23:22 +00:00
Elena Demikhovsky cc477bbcea Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.

Differential Revision https://reviews.llvm.org/D37670

llvm-svn: 312894
2017-09-11 06:18:15 +00:00
Matthias Braun 6b2b88b071 RegAllocFast: Fix warning; NFC
llvm-svn: 312852
2017-09-09 01:16:59 +00:00
Matthias Braun 864cf585ff RegAllocFast: Cleanup; NFC
- Use range based for
- Variable names should start with upper case
- Add `const`
- Change class name to match filename
- Fix doxygen comments
- Use MCPhysReg instead of unsigned
- Use references instead of pointers where things cannot be nullptr
- Misc coding style improvements

llvm-svn: 312846
2017-09-09 00:52:46 +00:00
Matthias Braun a09d18deb0 RegAllocFast: Move vector to class level to avoid reallocation; NFC
llvm-svn: 312845
2017-09-09 00:52:45 +00:00
Matthias Braun a5225e8cb0 RegAllocFast: Remove write-only set; NFC
llvm-svn: 312844
2017-09-09 00:52:42 +00:00
Wei Mi 5d84d9b35c Fix a bug for rL312641.
rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument. However on arm-none-eabi
platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't
have return value. The fix is to check the libcall name after expansion
to match "memcpy/memset/memmove" before allowing those intrinsic to be
tail calls.

llvm-svn: 312799
2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek f78eca8fb5 Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits
Differential Revision: https://reviews.llvm.org/D37600

llvm-svn: 312797
2017-09-08 16:29:50 +00:00
Adrian Prantl 99ba97726a Fix a crash when emitting debug info for multi-reg function arguments
by reusing more of the existing machinery

This is a follow-up to r312169.
Thanks to Björn Pettersson for the testcase!

llvm-svn: 312773
2017-09-08 02:31:37 +00:00
Dean Michael Berris 711dec260f [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

llvm-svn: 312772
2017-09-08 01:47:56 +00:00
Reid Kleckner 0e8c4bb055 Sink some IntrinsicInst.h and Intrinsics.h out of llvm/include
Many of these uses can get by with forward declarations. Hopefully this
speeds up compilation after adding a single intrinsic.

llvm-svn: 312759
2017-09-07 23:27:44 +00:00
Richard Trieu c7828ebea4 Revert r312318, r312325, r312424, r312489
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318

Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490

llvm-svn: 312758
2017-09-07 23:20:35 +00:00
Paul Robinson bb92137080 [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364

llvm-svn: 312751
2017-09-07 22:15:44 +00:00
Matt Arsenault 61ec738b60 DAG: Allow creating extract_vector_elt post-legalize
Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.

This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.

llvm-svn: 312730
2017-09-07 17:24:43 +00:00
Florian Hahn d39b8a3533 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 312719
2017-09-07 12:49:39 +00:00
Florian Hahn cf0cdd4c02 [MachineTraceMetrics] Add computeDepth function (NFCI).
Summary:
This function is used in D36619 to update the instruction depths
incrementally.

Reviewers: efriedma, Gerolf, MatzeB, fhahn

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36696

llvm-svn: 312714
2017-09-07 11:51:30 +00:00
Jonas Paulsson 0f056352a8 Revert "[RegAlloc] Make sure live-ranges reflect the state of the IR when removing them"
This temporarily reverts commit 463fa38 (r311401).

See https://bugs.llvm.org/show_bug.cgi?id=34502

llvm-svn: 312708
2017-09-07 09:13:17 +00:00
Matthias Braun c9056b834d Insert IMPLICIT_DEFS for undef uses in tail merging
Tail merging can convert an undef use into a normal one when creating a
common tail. Doing so can make the register live out from a block which
previously contained the undef use. To keep the liveness up-to-date,
insert IMPLICIT_DEFs in such blocks when necessary.

To enable this patch the computeLiveIns() function which used to
compute live-ins for a block and set them immediately is split into new
functions:
- computeLiveIns() just computes the live-ins in a LivePhysRegs set.
- addLiveIns() applies the live-ins to a block live-in list.
- computeAndAddLiveIns() is a convenience function combining the other
  two functions and behaving like computeLiveIns() before this patch.

Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D37034

llvm-svn: 312668
2017-09-06 20:45:24 +00:00
Krzysztof Parzyszek a3017aa2ab [IfConversion] Remove kill flags from common instructions as well
When if-converting a diamond, two separate blocks will be placed back
to back to form a straight line code. To ensure correctness of the
liveness information, any registers that are live in the second block
should not be killed in the first block, even if they were in the
original code.
Additionally, when the two blocks share common instructions at the
beginning, these instructions will not be duplicated, but only placed
once, before both of the blocks. Since the function "isIdenticalTo"
(as used here) ignores kill flags, the common initial code in one
block may have a kill flag for a register that is live in the other
block.
Because the code that removes kill flags only runs for the non-common
parts of the predicated blocks, a kill flag mismatch in the common
code could still lead to a live register being killed prematurely.

llvm-svn: 312654
2017-09-06 17:57:13 +00:00
Wei Mi 818d50a93d [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406

llvm-svn: 312641
2017-09-06 16:05:17 +00:00
Craig Topper 761bb1b53d [DAGCombiner] When combining EXTRACT_SUBVECTOR of a BUILD_VECTOR, make sure we don't create a BUILD_VECTOR with an illegal type after type legalization.
llvm-svn: 312621
2017-09-06 06:50:03 +00:00
Zachary Turner 37c747498d [CodeView] Don't output S_UDTs for nested typedefs.
S_UDT records are basically the "bridge" between the debugger's
expression evaluator and the type information. If you type
(Foo*)nullptr into the watch window, the debugger looks for an
S_UDT record named Foo. If it can find one, it displays your type.
Otherwise you get an error.

We have always understood this to mean that if you have code like
this:

  struct A {
    int X;
  };

  struct B {
    typedef A AT;
    AT Member;
  };

that you will get 3 S_UDT records. "A", "B", and "B::AT". Because
if you were to type (B::AT*)nullptr into the debugger, it would
need to find an S_UDT record named "B::AT".

But "B::AT" is actually the S_UDT record that would be generated
if B were a namespace, not a struct. So the debugger needs to be
able to distinguish this case. So what it does is:

  1. Look for an S_UDT named "B::AT". If it finds one, it knows
     that AT is in a namespace.
  2. If it doesn't find one, split at the scope resolution operator,
     and look for an S_UDT named B. If it finds one, look up the type
     for B, and then look for AT as one of its members.

With this algorithm, S_UDT records for nested typedefs are not just
unnecessary, but actually wrong!

The results of implementing this in clang are dramatic. It cuts
our /DEBUG:FASTLINK PDB sizes by more than 50%, and we go from
being ~20% larger than MSVC PDBs on average, to ~40% smaller.

It also slightly speeds up link time. We get about 10% faster
links than without this patch.

Differential Revision: https://reviews.llvm.org/D37410

llvm-svn: 312583
2017-09-05 22:06:39 +00:00
Reid Kleckner e33c94f1b0 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

llvm-svn: 312569
2017-09-05 20:14:58 +00:00
Sam McCall f71bb198ed Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

llvm-svn: 312490
2017-09-04 15:47:00 +00:00
Dean Michael Berris ebc1659016 [XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic references in .text
Summary:
This is a re-roll of D36615 which uses PLT relocations in the back-end
to the call to __xray_CustomEvent() when building in -fPIC and
-fxray-instrument mode.

Reviewers: pcc, djasper, bkramer

Subscribers: sdardis, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37373

llvm-svn: 312466
2017-09-04 05:34:58 +00:00
Ayman Musa 44cde94935 [X86] Fix crash on assert of non-simple type after type-legalization
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.

Adding a type check in case the combine is running after the type legalize pass.

Differential Revision: https://reviews.llvm.org/D37330

llvm-svn: 312438
2017-09-03 09:09:16 +00:00
Jessica Paquette b0d17d99dd [MIParser] Ensure getHexUint doesn't produce APInts with a bitwidth of 0
If getHexUint reads in a hex 0, it will create an APInt with a value of 0.
The number of active bits on this APInt is used to calculate the bitwidth of
Result. The number of active bits is defined as an APInt's bitwidth - its
number of leading 0s. Since this APInt is 0, its bitwidth and number of leading
0s are equal.

Thus, Result is constructed with a bitwidth of 0, triggering an APInt assert.

This commit fixes that by checking if the APInt is equal to 0, and setting the
bitwidth to 32 if it is. Otherwise, it sets the bitwidth using getActiveBits.

This caused issues when compiling MIR files with successor probabilities. In
the case that a successor is tagged with a probability of 0, this assert would
fire on debug builds.

https://reviews.llvm.org/D37401

llvm-svn: 312387
2017-09-01 22:17:14 +00:00
Matthias Braun cebdb17522 LiveIntervalAnalysis: Fix alias regunit reserved definition
A register in CodeGen can be marked as reserved: In that case we
consider the register always live and do not use (or rather ignore)
kill/dead/undef operand flags.

LiveIntervalAnalysis however tracks liveness per register unit (not per
register). We already needed adjustments for this in r292871 to deal
with super/sub registers. However I did not look at aliased register
there. Looking at ARM:

FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV
(regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit
(FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers.
This shared register unit was previously considered non-reserved,
however given that we uses of the reserved FPSCR potentially violate
some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV
reserved too and stop tracking liveness for it.

This patch:
- Defines a register unit as reserved when: At least for one root
  register, the root register and all its super registers are reserved.
- Adjust LiveIntervals::computeRegUnitRange() for new reserved
  definition.
- Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way
  of testing.
- Stop computing LiveRanges for reserved register units in HMEditor even
  with UpdateFlags enabled.
- Skip verification of uses of reserved reg units in the machine
  verifier (this usually didn't happen because there would be no cached
  liverange but there is no guarantee for that and I would run into this
  case before the HMEditor tweak, so may as well fix the verifier too).

Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today;
aliased registers are rarely used, the only other cases are hexagons
P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved
registers in an alias.

Differential Revision: https://reviews.llvm.org/D37356

llvm-svn: 312348
2017-09-01 18:36:26 +00:00
Geoff Berry 65528f2991 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 312328
2017-09-01 14:27:20 +00:00
Clement Courbet 65130e2d8d Reland rL312315: [MergeICmps] MergeICmps is a new optimization pass that turns chains of integer
Add missing header.

This reverts commit 86dd6335cf7607af22f383a9a8e072ba929848cf.

llvm-svn: 312322
2017-09-01 10:56:34 +00:00
Strahinja Petrovic 676fd0b022 Debug info for variables whose type is shrinked to bool
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing 
dwarf expression which returns either constant initial 
value or other value.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D35994

llvm-svn: 312318
2017-09-01 10:05:27 +00:00
Clement Courbet 316212575b Revert "[MergeICmps] MergeICmps is a new optimization pass that turns chains of integer"
Break build

This reverts commit d07ab866f7f88f81e49046d691a80dcd32d7198b.

llvm-svn: 312317
2017-09-01 09:43:08 +00:00
Clement Courbet 9473c01e96 [MergeICmps] MergeICmps is a new optimization pass that turns chains of integer
comparisons into memcmp.

Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().

For now this is disabled by default until:
 - https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
 - Benchmarks show that this is always useful.

Differential Revision:
https://reviews.llvm.org/D33987

llvm-svn: 312315
2017-09-01 09:07:05 +00:00
Jessica Paquette ffe4abc51b [MachineOutliner] Recommit r312194, missed optimization remarks
Before, this commit caused a buildbot failure:

http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll

This was caused by the Key value in DiagnosticInfoOptimizationBase being
deallocated before emitting the remarks defined in MachineOutliner.cpp. As of
r312277 this should no longer be an issue.
 

llvm-svn: 312280
2017-08-31 21:02:45 +00:00
Craig Topper 7081f029f3 [DAGCombiner] Do a better job of ensuring we don't split elements when combining an extract_subvector of a bitcasted build_vector.
llvm-svn: 312253
2017-08-31 17:02:22 +00:00