Commit Graph

39519 Commits

Author SHA1 Message Date
Wei Mi b96ebe6a1a Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more meaningful, NFC.
Add PR number and comment in pr30298.ll to explain what is testing.

llvm-svn: 280843
2016-09-07 18:46:15 +00:00
Wei Mi f100d4e93d Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

llvm-svn: 280837
2016-09-07 18:22:17 +00:00
Hans Wennborg 6c45f9233c Add more triple to conditional-tailcall.ll test
llvm-svn: 280835
2016-09-07 18:19:31 +00:00
Saleem Abdulrasool 02d9851c1c CodeGen: ensure that libcalls are always AAPCS CC
The original commit was too aggressive about marking LibCalls as AAPCS.  The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.

llvm-svn: 280833
2016-09-07 17:56:09 +00:00
Hans Wennborg 75e25f6812 X86: Fold tail calls into conditional branches where possible (PR26302)
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.

Example:

  define void @f(i32 %x, i32 %y) {
  entry:
    %p = icmp eq i32 %x, %y
    br i1 %p, label %bb1, label %bb2
  bb1:
    tail call void @foo()
    ret void
  bb2:
    tail call void @bar()
    ret void
  }

before:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     .LBB0_2
          jmp     foo
  .LBB0_2:
          jmp     bar

after:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     bar
  .LBB0_1:
          jmp     foo

I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.

This patch only does 32-bit, but 64-bit would work similarly.

Differential Revision: https://reviews.llvm.org/D24108

llvm-svn: 280832
2016-09-07 17:52:14 +00:00
Davide Italiano ec9612da1a [lib/LTO] Add a way to run a custom pipeline
Differential Revision:  https://reviews.llvm.org/D24095

llvm-svn: 280830
2016-09-07 17:46:16 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Reid Kleckner a9f4cc9510 [codeview] Add new directives to record inlined call site line info
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.

Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.

Also fixes issues with discontiguous inlined source locations, like in
this example:

  volatile int unlikely_cond = 0;
  extern void __declspec(noreturn) abort();
  __forceinline void f() {
    if (!unlikely_cond) abort();
  }
  int main() {
    unlikely_cond = 0;
    f();
    unlikely_cond = 0;
  }

Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.

Reviewers: majnemer, amccarth

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24014

llvm-svn: 280822
2016-09-07 16:15:31 +00:00
Justin Lebar 3a5f40c191 [LSV] Use the original loads' names for the extractelement instructions.
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions.  This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.

Reviewers: asbirlea, tstellarAMD

Subscribers: arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23748

llvm-svn: 280818
2016-09-07 15:49:48 +00:00
Simon Pilgrim d311311beb Fix typo in test - it should be masking bits0-15 not bit16
llvm-svn: 280816
2016-09-07 15:19:07 +00:00
Andrea Di Biagio bdd576dbb0 Regenerate vector bitcast folding tests using update_test_checks.py.
Two tests have been merged together, regenerated and then moved to
a more appropriate directory. No functional change.

llvm-svn: 280814
2016-09-07 14:50:07 +00:00
Simon Pilgrim 32cfa5ba83 [X86][SSE] Added or combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280813
2016-09-07 14:49:50 +00:00
Simon Pilgrim 65cdc058b6 [X86][SSE] Added and+or+zext combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280810
2016-09-07 14:00:52 +00:00
Simon Pilgrim 2415144425 [X86][SSE] Added and+or combine tests currently failing with vectors
(and (or x, C), D) -> D if (C & D) == D

Part of the yak shaving for D24253

llvm-svn: 280809
2016-09-07 13:40:03 +00:00
Pablo Barrio fc752bb70a [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)
Summary:
This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.

Reviewers: scott-0, jmolloy, compnerd, rengolin

Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits

Differential Revision: https://reviews.llvm.org/D24133

llvm-svn: 280808
2016-09-07 12:49:15 +00:00
Andrea Di Biagio f3fd316223 [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.

llvm-svn: 280807
2016-09-07 12:47:53 +00:00
Andrea Di Biagio 8df5b9cf48 [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi

The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.

This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.

Added reproducible test cases to x86-sse4a.ll.

Differential Revision: https://reviews.llvm.org/D24256

llvm-svn: 280804
2016-09-07 12:03:03 +00:00
Vasileios Kalintiris 1ed49fd384 [mips] Disable the TImode shift libcalls for 32-bit targets.
Summary:
The o32 ABI doesn't not support the TImode helpers. For the time being,
disable just the shift libcalls as they break recursive builds on MIPS.

Reviewers: sdardis

Subscribers: llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D24259

llvm-svn: 280798
2016-09-07 10:01:18 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Hal Finkel 42c83f131e [PowerPC] Fix address-offset folding for plain addi
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).

On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.

I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).

Fixes PR30280.

llvm-svn: 280789
2016-09-07 07:36:11 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper b880ad3a71 [AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input.
llvm-svn: 280781
2016-09-07 04:46:11 +00:00
Saleem Abdulrasool a7ade33d16 Revert "CodeGen: ensure that libcalls are always AAPCS CC"
This reverts SVN r280683.  Revert until I figure out why this is breaking lli
tests.

llvm-svn: 280778
2016-09-07 03:17:19 +00:00
Zachary Turner c998571493 Re-add "Make FieldList records print as a YAML sequence"
This was originally submitted in r280549, and reverted in r280577
due to breaking one MSVC buildbot.  The issue is that MSVC 2013
doesn't synthesize move constructors.  So even though i was
writing std::move(A) it was copying it, leading to a bogus ArrayRef.
The solution here is to simply remove the std::vector<> from the
type, since it is unused and unnecessary.  This way the ArrayRef
continues to point into the original memory backing the CVType.

llvm-svn: 280769
2016-09-06 23:45:47 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Ying Yi 24e91bd05f [llvm-cov] Add the project summary to the text coverage report for each source file.
This patch is a spin-off from https://reviews.llvm.org/D23922. It extends the text view to preserve the same feature as the html view.

Differential Revision: https://reviews.llvm.org/D24241

llvm-svn: 280756
2016-09-06 21:41:38 +00:00
Konstantin Zhuravlyov 864718c666 [AMDGPU] Wave and register controls
- Add missing test

llvm-svn: 280749
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Ying Yi d36b47c481 [llvm-cov] Add the "Go to first unexecuted line" feature.
This patch provides easy navigation to find the zero count lines, especially useful when the source file is very large.

Differential Revision: https://reviews.llvm.org/D23277

llvm-svn: 280739
2016-09-06 19:31:18 +00:00
Rafael Espindola b940b66c60 Add an c++ itanium demangler to llvm.
This adds a copy of the demangler in libcxxabi.

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:

    No constexpr.
    No alignas

On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi

My current plan is:

    Commit something like this
    Change lld to use it
    Change lldb to use it as the fallback

    Add a few #ifdefs so that exactly the same file can be used in
    libcxxabi to export abi::__cxa_demangle.

Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.

llvm-svn: 280732
2016-09-06 19:16:48 +00:00
Krzysztof Parzyszek 7c9b012629 [RDF] Ignore undef use operands
llvm-svn: 280717
2016-09-06 17:03:13 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
Simon Dardis b432a3ed7e [mips] Tighten FastISel restrictions
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.

This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.

Thanks to mike.k for reporting this issue!

Reviewers: seanbruno, vkalintiris

Differential Review: https://reviews.llvm.org/D23795

llvm-svn: 280706
2016-09-06 12:36:24 +00:00
Krzysztof Parzyszek 020ec299bf [PPC] Claim stack frame before storing into it, if no red zone is present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093

llvm-svn: 280705
2016-09-06 12:30:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper cf9f1b8dfa [AVX-512] Add a test case to show that we don't select masked vpermi2ps when the index operand comes from a bitcast.
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.

Taken from optimized output from clang's test case.

llvm-svn: 280695
2016-09-06 05:45:27 +00:00
Saleem Abdulrasool bfa25bd1ac ARM: workaround bundled operation predication
This is a Windows ARM specific issue.  If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up.  This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.

For now, report a bundle as being unpredicatable.  Although this is false, this
would trigger a failure case previously anyways, so this is no worse.  That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.

Under certain circumstances, it may be possible to "predicate the bundle".  This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence.  If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.

llvm-svn: 280689
2016-09-06 04:00:12 +00:00
Craig Topper 62d0a5e7d3 [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.
llvm-svn: 280684
2016-09-06 00:31:10 +00:00
Saleem Abdulrasool a6519b1d54 CodeGen: ensure that libcalls are always AAPCS CC
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683
2016-09-06 00:28:43 +00:00
Craig Topper dfc4fc9f02 [AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double.
Still need to fix the register classes to allow the extended range of registers.

llvm-svn: 280682
2016-09-05 23:58:40 +00:00
Craig Topper 70e1348031 [X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add command lines for AVX and AVX512 feature sets.
llvm-svn: 280681
2016-09-05 23:58:37 +00:00
Craig Topper f54ebca2dd [X86] Update fast-isel vector load test to have more 256 and 512-bit test cases. Add a command line for SKX features too.
llvm-svn: 280680
2016-09-05 23:58:32 +00:00
Sanjay Patel e341c919b0 fix FileCheck variables for test added with r280677
The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679
2016-09-05 23:49:32 +00:00
Gor Nishanov ccabaca273 [Coroutines] Part12: Handle alloca address-taken
Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678
2016-09-05 23:45:45 +00:00
Sanjay Patel eea2ef7862 [InstCombine] don't assert that division-by-constant has been folded (PR30281)
This is effectively a revert of:
https://reviews.llvm.org/rL280115

And this should fix
https://llvm.org/bugs/show_bug.cgi?id=30281:

llvm-svn: 280677
2016-09-05 23:38:22 +00:00
Sanjay Patel 46f9df5b71 [InstCombine] revert r280637 because it causes test failures on an ARM bot
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll

llvm-svn: 280676
2016-09-05 22:36:32 +00:00
Simon Pilgrim 6dd4b3526a [X86][SSE] Add test cases for PR29078
'Failure to recognise i64 sitofp/uitofp conversions that can be performed as i32'

llvm-svn: 280671
2016-09-05 18:11:17 +00:00
Simon Pilgrim efab350967 [X86][SSE] Add test cases for PR29079
'Failure to recognise uitofp conversions that can be performed as sitofp'

llvm-svn: 280670
2016-09-05 18:04:38 +00:00
Simon Pilgrim 5f14ff75ec [X86][SSE] Regenerate odd shuffle tests with common prefixes
llvm-svn: 280661
2016-09-05 14:15:38 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
James Molloy 728cf85950 [Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc}
These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively.

Fixes PR30279.

llvm-svn: 280651
2016-09-05 08:29:15 +00:00
Igor Breger a2f8ca9a34 [AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits.
Differential Revision: http://reviews.llvm.org/D23983

llvm-svn: 280650
2016-09-05 08:26:51 +00:00
Craig Topper d9ca3d97ef [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648
2016-09-05 06:43:06 +00:00
Gor Nishanov 0e18f75a92 [Coroutines] Part11: Add final suspend handling.
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24068

llvm-svn: 280646
2016-09-05 04:44:30 +00:00
Craig Topper 42b69dd961 [X86] Add AVX and AVX512 command lines to the vec_ss_load_fold test.
llvm-svn: 280645
2016-09-05 02:20:53 +00:00
Sanjay Patel c641e9d6ff [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
The code to calculate 'UsesRemoved' could be simplified.
As-is, that code is a victim of PR30273:
https://llvm.org/bugs/show_bug.cgi?id=30273

llvm-svn: 280637
2016-09-04 20:58:27 +00:00
Simon Pilgrim cded03f163 [X86] Regenerate x64 mmx/f64 return value tests
llvm-svn: 280634
2016-09-04 18:14:45 +00:00
Craig Topper 4177345d7f [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR.
llvm-svn: 280633
2016-09-04 18:13:33 +00:00
Lang Hames 38c7927b6f [ORC] Clone module flags metadata into the globals module in the
CompileOnDemandLayer.

Also contains a tweak to the orc-lazy jit in LLI to enable the test case.

llvm-svn: 280632
2016-09-04 17:53:30 +00:00
Simon Pilgrim 128047fde5 [X86] Regenerate trunc-store legalization test
llvm-svn: 280631
2016-09-04 17:50:03 +00:00
Simon Pilgrim c228f5c195 [X86][SSE] Regenerate fcmp/uitofp combine tests
llvm-svn: 280629
2016-09-04 17:16:01 +00:00
Igor Breger 7e2a0dfa0c revert r279960.
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
2016-09-04 14:03:52 +00:00
Simon Pilgrim 9a36318c54 EOL fixes
llvm-svn: 280624
2016-09-04 13:30:46 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00
Hal Finkel 73390c7acd [PowerPC] Zero-extend constants in FastISel
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.

The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).

Fixes PR27721.

llvm-svn: 280614
2016-09-04 06:07:19 +00:00
Craig Topper af0d63d2e7 [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR.
llvm-svn: 280611
2016-09-04 02:09:53 +00:00
Joseph Tremoulet e92e0a9042 Fix inliner funclet unwind memoization
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree.  The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations.  In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination.  This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.

Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.

Fixes PR29151.


Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24117

llvm-svn: 280610
2016-09-04 01:23:20 +00:00
Xinliang David Li 241e6c7086 [Profile] preserve branch metadata lowering select in CGP
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600
2016-09-03 21:26:36 +00:00
Mehdi Amini ebb3434850 Fix ThinLTO crash with debug info
Because the recent change about ODR type uniquing in the context,
we can reach types defined in another module during IR linking.
This triggered some assertions in case we IR link without starting
from an empty module. To alleviate that, we can self-map metadata
defined in the destination module so that they won't be visited.

Differential Revision: https://reviews.llvm.org/D23841

llvm-svn: 280599
2016-09-03 21:12:33 +00:00
Craig Topper 907b580d72 [AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test.
llvm-svn: 280593
2016-09-03 17:20:07 +00:00
Nicolai Haehnle 3bba6a8438 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

llvm-svn: 280590
2016-09-03 12:26:38 +00:00
Nicolai Haehnle a246dccc26 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Craig Topper 892ce56901 [AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables.
llvm-svn: 280581
2016-09-03 04:37:50 +00:00
Nico Weber 05e78450be Revert r280549.
The test it added doesn't pass:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15318/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apdbdump-yaml-types.test

Command Output (stdout):
--
$ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\llvm-pdbdump.EXE" "pdb2yaml" "-tpi-stream" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB/Inputs/empty.pdb"
$ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\FileCheck.EXE" "-check-prefix=YAML" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test"
# command stderr:
D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test:36:7: error: expected string not found in input
YAML: Name: apartment
      ^
<stdin>:153:10: note: scanning from here
 Value: 161
         ^

llvm-svn: 280577
2016-09-03 03:18:49 +00:00
Hal Finkel 522e4d9d66 [PowerPC] Support asm parsing for bc[l][a][+-] mnemonics
PowerPC assembly code in the wild, so it seems, has things like this:

  bc+     12, 28, .L9

This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.

This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).

Fixes PR23646.

llvm-svn: 280571
2016-09-03 02:31:44 +00:00
Wei Mi c37307a5f4 Fix buildbot error.
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568
2016-09-03 01:43:28 +00:00
Hal Finkel 28842b96f3 [PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfev
These few book-III instructions are used by the Linux kernel.

Partially fixes PR24796.

llvm-svn: 280560
2016-09-02 23:42:01 +00:00
Hal Finkel 277736eee6 [PowerPC] Add support for the extended dcbf form and mnemonics
dcbf has an optional hint-like field, add support for the extended form and the
associated mnemonics (dcbfl and dcbflp).

Partially fixes PR24796.

llvm-svn: 280559
2016-09-02 23:41:54 +00:00
Zachary Turner 83849415de [codeview] Make FieldList records print as a yaml sequence.
Before we were kind of imitating the behavior of a Yaml sequence
by outputting each record one after the other.  This makes it a
little cumbersome when we want to go the other direction -- from
Yaml to Pdb.  So this treats FieldList records as no different than
any other list of records, by printing them as a Yaml sequence with
the exact same format.

llvm-svn: 280549
2016-09-02 22:19:01 +00:00
Xinliang David Li 40afd5c9e4 [Profile] handle select instruction in 'expect' lowering
Builtin expect lowering currently ignores select. This patch
fixes the issue

Differential Revision: http://reviews.llvm.org/D24166

llvm-svn: 280547
2016-09-02 22:03:40 +00:00
Hal Finkel 7b104d4721 [PowerPC] For larger offsets, when possible, fold offset into addis toc@ha
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.

llvm-svn: 280545
2016-09-02 21:37:07 +00:00
Jan Vesely ea45746d5a AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements.
Fixes R600 piglit regressions since r280298

Differential Revision: https://reviews.llvm.org/D24174

llvm-svn: 280535
2016-09-02 20:13:19 +00:00
Krzysztof Parzyszek 3bf4aeccd6 Do not consider subreg defs as reads when computing subrange liveness
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.

Differential Revision: https://reviews.llvm.org/D24190

llvm-svn: 280532
2016-09-02 19:48:55 +00:00
Sanjay Patel 70277411d3 [InstCombine] auto-generate assertions for tighter checking
llvm-svn: 280531
2016-09-02 19:38:37 +00:00
Jan Vesely 00864886f4 AMDGPU/R600: Expand unaligned writes to local and global AS
LOCAL and GLOBAL AS only
PRIVATE needs special treatment

Differential Revision: https://reviews.llvm.org/D23971

llvm-svn: 280526
2016-09-02 19:07:06 +00:00
Jan Vesely cd6b12b12e AMDGPU: Reorganize store tests
Split by AS.
Merge with some prviously failing tests.

Differential Revision: https://reviews.llvm.org/D23969

llvm-svn: 280523
2016-09-02 18:52:28 +00:00
Reid Kleckner fa28396f97 [codeview] Use the correct max CV record length of 0xFF00
Previously we were splitting our records at 0xFFFF bytes, which the
Microsoft tools don't like.

Should fix failure on the new Windows self-host buildbot.

This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h

llvm-svn: 280522
2016-09-02 18:43:27 +00:00
Kyle Butt 8699921c4b IfConversion: Fix bug introduced by rescanning diamonds.
Passing the wrong values for predicate-clobbering. Simple to miss.
Added an assert to make this easier to catch in the future.

llvm-svn: 280517
2016-09-02 18:29:26 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Sanjay Patel 521f19f249 [InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.

Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Differential Revision: https://reviews.llvm.org/D23886

llvm-svn: 280504
2016-09-02 17:05:43 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Andrea Di Biagio 805815f407 [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN.
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.

An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.

Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".

This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.

Differential Revision: https://reviews.llvm.org/D24143

llvm-svn: 280488
2016-09-02 14:47:43 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Alexey Bataev ed27d69037 [InstCombine] Add test for insertelementinsts with constants.
Added a tests that shows that several insertelementinsts with constant
indexes/data are not folded into a single shuffleinst.

llvm-svn: 280474
2016-09-02 09:00:53 +00:00
George Rimar d8a4ecac3b [llvm-readobj] - Teach readobj to print DT_AUXILIARY dynamic tag in human readable form.
Previously DT_AUXILIARY was unknown, patch fixes that.

Differential revision: https://reviews.llvm.org/D24138

llvm-svn: 280471
2016-09-02 07:35:19 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
Craig Topper ad79bf4712 [AVX-512] Move tests for masked floating point logical operations to avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded.
llvm-svn: 280467
2016-09-02 06:11:31 +00:00
Craig Topper 45d6503089 [AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type.
These are needed in order to remove the masked floating point logical operation intrinsics and use native IR.

llvm-svn: 280465
2016-09-02 05:29:13 +00:00
Craig Topper 00aecd97bf [AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same.
llvm-svn: 280464
2016-09-02 05:29:09 +00:00
Hal Finkel 5ef4b03106 [PowerPC] hasAndNotCompare should return true
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).

Fixes PR27203.

llvm-svn: 280457
2016-09-02 02:58:25 +00:00
Hal Finkel a39fd4bc53 [PowerPC] Add a pattern for a runtime bit check
Following a suggestion by Sanjay, we should lower:

  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp eq i32 %and, %shl
  ret i1 %cmp

into:

  subfic r4, r4, 32
  rlwnm r3, r3, r4, 31, 31

Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.

llvm-svn: 280454
2016-09-02 02:34:44 +00:00
NAKAMURA Takumi 9aec5c3797 llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes.
llvm-svn: 280451
2016-09-02 01:33:00 +00:00
Hal Finkel b54579fab6 [PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.

Fixes PR28727.

llvm-svn: 280441
2016-09-02 00:28:20 +00:00
Hal Finkel 1e8218cc09 [PowerPC] Don't consider fusion in PPC64 address-formation peephole
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.

We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.

llvm-svn: 280440
2016-09-02 00:27:50 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Heejin Ahn c0f18172f5 [WebAssembly] Add asm.js-style setjmp/longjmp handling for wasm (reland r280302)
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Reviewers: jpp, dschuff

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D24121

llvm-svn: 280415
2016-09-01 21:05:15 +00:00
Tim Northover 8d8812c5d7 GlobalISel: add a G_PHI instruction to give phis a type.
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.

llvm-svn: 280412
2016-09-01 20:45:41 +00:00
Sanjay Patel 199a8e6a92 [InstCombine] add tests to show potential shuffle+insert folds
llvm-svn: 280403
2016-09-01 19:14:19 +00:00
Andrey Turetskiy cde38b6a99 [X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions.
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.

Differential Revision: https://reviews.llvm.org/D23919

llvm-svn: 280402
2016-09-01 18:50:02 +00:00
Yaxun Liu add05a8d95 AMDGPU: Add runtime metadata for pointee alignment of argument.
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.

Differential Revision: https://reviews.llvm.org/D24145

llvm-svn: 280399
2016-09-01 18:46:49 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Simon Dardis bd27154757 [mips] interAptiv based generic schedule model
This scheduler describes a processor which covers all MIPS ISAs based
around the interAptiv and P5600 timings.

Reviewers: vkalintiris, dsanders

Differential Revision: https://reviews.llvm.org/D23551

llvm-svn: 280374
2016-09-01 14:53:53 +00:00
Sanjay Patel dd861964d1 [InstCombine] remove fold of an icmp pattern that should never happen
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.

The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().

Differential Revision: https://reviews.llvm.org/D24031 

llvm-svn: 280370
2016-09-01 14:20:43 +00:00
Krzysztof Parzyszek 07d9f53b51 [Hexagon] Deal with undefs when extending live intervals
Reapply r280275, since MSVC accepts r280358.

llvm-svn: 280369
2016-09-01 13:59:35 +00:00
Elena Demikhovsky 4d7738dfde Optimized FMA intrinsic + FNEG , like
-(a*b+c)

and FNEG + FMA, like
a*b-c or (-a)*b+c.

The bug description is here :  https://llvm.org/bugs/show_bug.cgi?id=28892

Differential revision: https://reviews.llvm.org/D23313

llvm-svn: 280368
2016-09-01 13:58:53 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
James Molloy eec6df3193 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280351
2016-09-01 10:44:35 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Hal Finkel 40d7f5c277 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

llvm-svn: 280347
2016-09-01 09:42:39 +00:00
Dean Michael Berris e8ae5baaf7 [XRay] Detect and emit sleds for sibling/tail calls
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.

The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.

A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.

Reviewers: echristo, rSerge, majnemer

Subscribers: mehdi_amini, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D23986

llvm-svn: 280334
2016-09-01 01:29:13 +00:00
Heejin Ahn 10a7086700 Revert "Add asm.js-style setjmp/longjmp handling for wasm"
This reverts commit r280302, it broke the integration tests.

llvm-svn: 280329
2016-09-01 00:44:37 +00:00
Nick Lewycky 97e49ac59e Add -fprofile-dir= to clang.
-fprofile-dir=path allows the user to specify where .gcda files should be
emitted when the program is run. In particular, this is the first flag that
causes the .gcno and .o files to have different paths, LLVM is extended to
support this. -fprofile-dir= does not change the file name in the .gcno (and
thus where lcov looks for the source) but it does change the name in the .gcda
(and thus where the runtime library writes the .gcda file). It's different from
a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip
paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX.

To implement this we split -coverage-file into -coverage-data-file and
-coverage-notes-file to specify the two different names. The !llvm.gcov
metadata node grows from a 2-element form {string coverage-file, node dbg.cu}
to 3-elements, {string coverage-notes-file, string coverage-data-file, node
dbg.cu}. In the 3-element form, the file name is already "mangled" with
.gcno/.gcda suffixes, while the 2-element form left that to the middle end
pass.

llvm-svn: 280306
2016-08-31 23:04:32 +00:00
Heejin Ahn 23d57103a4 Add asm.js-style setjmp/longjmp handling for wasm
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Reviewers: jpp, dschuff

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D23928

llvm-svn: 280302
2016-08-31 22:40:34 +00:00
Reid Kleckner 109448ee81 Revert "Add an optional parameter with a list of undefs to extendToIndices"
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.

Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".

llvm-svn: 280301
2016-08-31 22:36:02 +00:00
Sanjay Patel 0d70831d73 [InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors
The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 )
allows us to remove the ConstantInt check; no other changes needed.

llvm-svn: 280300
2016-08-31 22:18:43 +00:00
Sanjay Patel 541aef4661 [InstCombine] allow icmp (div X, Y), C folds for splat constant vectors
Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO.

llvm-svn: 280299
2016-08-31 21:57:21 +00:00
Matt Arsenault b50eb8dc2b AMDGPU: Fix introducing stack access on unaligned v16i8
llvm-svn: 280298
2016-08-31 21:52:27 +00:00
Tim Northover 11a2354670 GlobalISel: use G_TYPE to annotate physregs with a type.
More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.

This is done via a new G_TYPE instruction.

llvm-svn: 280292
2016-08-31 21:24:02 +00:00
Derek Schuff 1b258d313c [WebAssembly] Disable folding of GA+reg into load/store constant offsets
Summary:
If the register has a negative value then unsigned overflow will occur;
this case is sometimes even created intentionally by LSR. For now
disable GA+reg folding. Fixes PR29127

Differential Revision: https://reviews.llvm.org/D24053

llvm-svn: 280285
2016-08-31 20:27:20 +00:00
Geoff Berry 8d84605f25 [EarlyCSE] Optionally use MemorySSA. NFC.
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.

Reviewers: dberlin, sanjoy, reames, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19821

llvm-svn: 280279
2016-08-31 19:24:10 +00:00
Quentin Colombet bd850f4185 Actually check for the diagnostic to be emitted!
This makes the test case in r280273 actually useful!

llvm-svn: 280276
2016-08-31 18:53:32 +00:00
Krzysztof Parzyszek e21a0b3b9f [Hexagon] Deal with undefs when extending live intervals
llvm-svn: 280275
2016-08-31 18:52:09 +00:00
Tom Stellard ba5730884b AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned
Summary: This fixes some OpenCV tests that were broken by libclc commit r276443.

Reviewers: arsenm, jvesely

Subscribers: arsenm, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24051

llvm-svn: 280274
2016-08-31 18:46:07 +00:00
Quentin Colombet 1c06a73a7c [TargetPassConfig] Add a hook to tell whether GlobalISel should warm on fallback.
Thanks to this patch, we know have a way to easly see if GlobalISel
failed.

llvm-svn: 280273
2016-08-31 18:43:04 +00:00
Kevin Enderby 9d0c945ad6 Next set of additional error checks for invalid Mach-O files for bad load commands
that use the Mach::linkedit_data_command type for the load commands that are
currently used in the MachOObjectFile constructor.

This contains the missing checks for LC_DATA_IN_CODE and
LC_LINKER_OPTIMIZATION_HINT load commands and the fields for the
Mach::linkedit_data_command type.  Checking for other load commands that
use this type will be added later.

Also fixed a couple of places that was using sizeof(MachOObjectFile::LoadCommandInfo)
that should have been using sizeof(MachO::load_command).

llvm-svn: 280267
2016-08-31 17:57:46 +00:00
Geoff Berry 64f5ed172a [EarlyCSE] Allow forwarding a non-invariant load into an invariant load.
Reviewers: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23935

llvm-svn: 280265
2016-08-31 17:45:31 +00:00
Teresa Johnson 8068cc68f7 [LTO] Fix common test to reflect r279911 and move to X86 subdirectory
Adjust the test to reflect the changes to common handling in r279911.
This test wasn't running due to an incorrect REQUIRES and thus missed
being modified for r279911 before. It was changed to XFAIL when the
bad REQUIRES was discovered.

Remove the XFAIL and move to a new X86 subdirectory that will properly
disable on non-X86.

llvm-svn: 280256
2016-08-31 16:15:39 +00:00
Reid Kleckner 9dac47319d [codeview] Emit vtable shape information
The shape of the vtable is passed down as the size of the
__vtbl_ptr_type. This special pointer type appears both as the pointee
type of the vptr type, and by itself in every dynamic class. For classes
with multiple vtables, only the shape of the primary vftable is
included, as the shape of all secondary vftables will be the same as in
the base class.

Fixes PR28150

llvm-svn: 280254
2016-08-31 15:59:30 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Simon Pilgrim 6199b4fd49 [X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result
Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles.

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280249
2016-08-31 15:09:34 +00:00
Sjoerd Meijer 46b5b88387 Clang patch r280064 introduced ways to set the FP exceptions and denormal
types. This is the LLVM counterpart and it adds options that map onto FP
exceptions and denormal build attributes allowing better fp math library
selections.

Differential Revision: https://reviews.llvm.org/D24070

llvm-svn: 280246
2016-08-31 14:17:38 +00:00
Krzysztof Parzyszek 64488118bf Fixed spill stack objects are mutable
Differential Revision: https://reviews.llvm.org/D24039

llvm-svn: 280244
2016-08-31 13:52:17 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
Nikolay Haustov eba808957e AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass
Summary:
Simply replace usage of aliases to functions with aliasee.
This came up when bitcode linking to builtin library and
calls to aliases not being resolved.

Also made minor improvements to existing test.

Reviewers: tstellarAMD, alex-t, vpykhtin

Subscribers: arsenm, wdng, rampitec

Differential Revision: https://reviews.llvm.org/D24023

llvm-svn: 280221
2016-08-31 11:18:33 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
Simon Pilgrim 7b09af193a [X86][SSE] Improve awareness of fptrunc implicit zeroing of upper 64-bits of xmm result
Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280214
2016-08-31 10:35:13 +00:00
Craig Topper 8f6827c945 [AVX-512] Add patterns to select masked logical operations if the select has a floating point type.
This is needed in order to replace the masked floating point logical op intrinsics with native IR.

llvm-svn: 280195
2016-08-31 05:37:52 +00:00
Craig Topper 0f8fb47637 [AVX-512] Add test cases for masked floating point logic operations with bitcasts between the logic ops and the select. We don't currently select masked operations for these cases.
Test cases taken from optimized clang output after trying to convert the masked floating point logical op intrinsics to native IR.

llvm-svn: 280194
2016-08-31 05:37:50 +00:00
Craig Topper de8b1a0012 [X86] Regenerate a test using update_llc_test_checks.py.
llvm-svn: 280193
2016-08-31 05:37:47 +00:00
Dean Michael Berris 047669f18c [XRay] Support multiple return instructions in a single basic block
Add a .mir test to catch this case, and fix the xray-instrumentation
pass to handle it appropriately.

llvm-svn: 280192
2016-08-31 05:20:08 +00:00
Hal Finkel 97a189c716 [PowerPC] Don't spill the frame pointer twice
When a function contains something, such as inline asm, which explicitly
clobbers the register used as the frame pointer, don't spill it twice. If we
need a frame pointer, it will be saved/restored in the prologue/epilogue code.
Explicitly spilling it again will reuse the same spill slot used by the
prologue/epilogue code, thus clobbering the saved value. The same applies
to the base-pointer or PIC-base register.

Partially fixes PR26856. Thanks to Ulrich for his analysis and the small
inline-asm reproducer.

llvm-svn: 280188
2016-08-31 00:52:03 +00:00
Gor Nishanov 50d7fb974f [Coroutines] Part 10: Add coroutine promise support.
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.

2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.

Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23993

llvm-svn: 280184
2016-08-31 00:35:41 +00:00
Vedant Kumar 8938f92a5e [llvm-cov] Drop redundant "No." suffix in a column title
llvm-svn: 280181
2016-08-31 00:09:44 +00:00
Alina Sbirlea 3f8f7840bf [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Reviewers: tstellarAMD, arsenm, jlebar

Subscribers: mzolotukhin, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24057

llvm-svn: 280179
2016-08-30 23:53:59 +00:00
Sanjay Patel ddb53dd080 [InstCombine] add tests to show type limitations of InsertRangeTest and callers
llvm-svn: 280175
2016-08-30 23:16:59 +00:00
Kevin Enderby 36865d6abf Add a test file, macho-invalid-dysymtab-extreloff-nextrel,
I forgot to do an svn add on.

llvm-svn: 280167
2016-08-30 21:48:06 +00:00
Kevin Enderby dcbc504c47 Next set of additional error checks for invalid Mach-O files for bad LC_DYSYMTAB’s.
This contains the missing checks for LC_DYSYMTAB load command fields.

llvm-svn: 280161
2016-08-30 21:28:30 +00:00
Tim Northover 991b12bf09 GlobalISel: combine extracts & sequences created for legalization
Legalization ends up creating many G_SEQUENCE/G_EXTRACT pairs which leads to
inefficient codegen (even for -O0), so add a quick pass over the function to
remove them again.

llvm-svn: 280155
2016-08-30 20:51:25 +00:00
Matt Arsenault a609e2d5ce AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

llvm-svn: 280154
2016-08-30 20:50:08 +00:00
Hemant Kulkarni 1c852951a7 Revert "ELFDumper: Unversioned symbols must not have trailing @"
This reverts commit 8df7a877949e8782a3a28e3ecdb0770c1e444056.

Fixing other repositories and adding changes together.

llvm-svn: 280152
2016-08-30 20:42:46 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Daniel Berlin c943d72d94 IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar
llvm-svn: 280143
2016-08-30 19:58:48 +00:00
Hemant Kulkarni b2081648b2 ELFDumper: Unversioned symbols must not have trailing @
llvm-svn: 280140
2016-08-30 19:50:02 +00:00
Tim Northover e5102de678 GlobalISel: forbid physical registers on generic MIs.
We're intending to move to a world where the type of a register is determined
by its (unique) def. This is incompatible with physregs, which are untyped.

It also means the other passes don't have to worry quite so much about
register-class compatibility and inserting COPYs appropriately.

llvm-svn: 280132
2016-08-30 18:52:46 +00:00
Saleem Abdulrasool 6a405448f3 llvm-readobj: add support for printing GNU Notes
Add support for printing the GNU Notes.  This allows an easy way to view the
build id for a binary built with the build id.  Currently, this only handles the
GNU notes, though it would be easy to extend for other note types (default,
FreeBSD, NetBSD, etc).  Only the GNU style is supported currently.

llvm-svn: 280131
2016-08-30 18:52:02 +00:00
Sanjay Patel b37145712e [InstCombine] replace divide-by-constant checks with asserts; NFC
These folds already have tests for scalar and vector types, except 
for the vector div-by-0 case, so I'm adding tests for that.

llvm-svn: 280115
2016-08-30 17:31:34 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
Ying Yi 76eb219c9b [llvm-cov] Use the native path in the coverage report.
The coverage reports contain the source or binary file paths. On Windows, 
the file path might contain the seperators of both '/' and '\'. This patch 
uses the native path in the coverage reports. For example, on Windows, 
all '/' are converted to '\'.

Differential Revision: https://reviews.llvm.org/D23922

llvm-svn: 280061
2016-08-30 07:01:37 +00:00
Hal Finkel 18d0e3f44c [PowerPC] Force entry alignment in .got2
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.

llvm-svn: 280049
2016-08-30 01:43:38 +00:00
Kostya Serebryany 5ac427b8e4 [sanitizer-coverage] add two more modes of instrumentation: trace-div and trace-gep, mostly usaful for value-profile-based fuzzing; llvm part
llvm-svn: 280043
2016-08-30 01:12:10 +00:00
Hal Finkel b074a608ce [PowerPC] Add support for -mlongcall
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.

Fixes PR19098.

llvm-svn: 280040
2016-08-30 00:59:23 +00:00
Hal Finkel a819cda059 [PowerPC] Add triple to test/CodeGen/PowerPC/atomic-2.ll for ppc64le
Otherwise, running the test on Darwin systems will not work.

llvm-svn: 280034
2016-08-30 00:22:22 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Hal Finkel 3d70a9dbb7 [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomics
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).

Fixes PR22923.

llvm-svn: 280022
2016-08-29 22:25:36 +00:00
Saleem Abdulrasool ef72107a49 ExecutionEngine: fix a bug in the movt/movw relocator
According to the arm arm specifications, 4 bytes are needed for a shift instead
of 8, this was causing the movt instruction to write to a different register
sometimes.

Patch by Walter Erquinigo!

llvm-svn: 280005
2016-08-29 20:42:03 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
Krzysztof Parzyszek 354832e585 Propagate TBAA info in SelectionDAG::getIndexedLoad
Patch by Pranav Bhandarkar.

llvm-svn: 279998
2016-08-29 19:50:15 +00:00
Tom Stellard 0d23ebe888 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Vitaly Buka 3c4f6bf654 [asan] Enable new stack poisoning with store instruction by default
Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23968

llvm-svn: 279993
2016-08-29 19:28:34 +00:00
Tom Stellard c2ff0eb697 AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

llvm-svn: 279991
2016-08-29 19:15:22 +00:00
Tim Northover edb3c8ccb8 GlobalISel: legalize frem to a libcall on AArch64.
llvm-svn: 279988
2016-08-29 19:07:16 +00:00
Matt Arsenault b90fc9b3b4 AMDGPU/R600: Fix fixups used for constant arrays
Fixes bug 29289

llvm-svn: 279986
2016-08-29 19:01:48 +00:00
Kyle Butt 092c4dd5b6 IfConversion: Fix branch predication bug.
This bug shows up with diamonds that share unpredicable, unanalyzable branches.
There's an included test case from Hexagon. What was happening was that we were
attempting to predicate the branch instruction despite the fact that it was
checked to be the same. Now for unanalyzable branches we skip over the branch
instructions when predicating the block.

Differential Revision: https://reviews.llvm.org/D23939

llvm-svn: 279985
2016-08-29 18:27:12 +00:00
Vitaly Buka 793913c7eb Use store operation to poison allocas for lifetime analysis.
Summary:
Calling __asan_poison_stack_memory and __asan_unpoison_stack_memory for small
variables is too expensive.

Code is disabled by default and can be enabled by -asan-experimental-poisoning.

PR27453

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23947

llvm-svn: 279984
2016-08-29 18:17:21 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Reid Kleckner cfec5ff1b9 Make vec_fabs.ll pass with MSVC 2013
We should revert this change once we drop support for MSVC 2013.

llvm-svn: 279979
2016-08-29 16:35:43 +00:00
Teresa Johnson d0b87b1a89 [gold] Fix test accidentally regressed for newer gold
With r279911 I accidentally regressed the gold/X86/start-lib-common.ll
test for newer golds (v1.12+) that honor the --start-lib/--end-lib.
Remove the alignment which should not be there to make this work with
both old and new gold linkers.

Additionally, now that we have a subdirectory for v1.12+ gold tests,
copy this test there and check specifically for the v1.12+ behavior.

llvm-svn: 279977
2016-08-29 16:22:23 +00:00