Commit Graph

14697 Commits

Author SHA1 Message Date
Jun Bum Lim 6755c3bc5f [AArch64] Promote loads from stored
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.

Original commit message:

This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256249
2015-12-22 16:36:16 +00:00
Asaf Badouh 13ffa4bf7c [X86][AVX512] Add rcp14 and rsqrt14 intrinsics
Differential Revision: http://reviews.llvm.org/D15414

llvm-svn: 256237
2015-12-22 11:40:04 +00:00
Keno Fischer 4eccf11373 [ASMPrinter] Fix missing handling of DW_OP_bit_piece
In r256077, I added printing for DIExpressions in DEBUG_VALUE comments,
but neglected to handle DW_OP_bit_piece operands. Thanks to
Mikael Holmen and Joerg Sonnenberger for spotting this.

llvm-svn: 256236
2015-12-22 07:14:50 +00:00
David Majnemer ff1d084aa2 [MC] Don't use the architecture to govern which object file format to use
InitMCObjectFileInfo was trying to override the triple in awkward ways.
For example, a triple specifying COFF but not Windows was forced as ELF.
This makes it easy for internal invariants to get violated, such as
those which triggered PR25912.

This fixes PR25912.

llvm-svn: 256226
2015-12-22 01:39:04 +00:00
Jun Bum Lim a23e5f7516 Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
This is recommit of r256028 with minor fixes in unittests:
  CodeGen/Mips/eh.ll
  CodeGen/Mips/insn-zero-size-bb.ll

Original commit message:

When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256202
2015-12-21 22:00:51 +00:00
Cong Hou 8df93ce455 [X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.

Cost table is updated accordingly.


Differential revision: http://reviews.llvm.org/D14588

llvm-svn: 256194
2015-12-21 20:42:43 +00:00
Adrian Prantl 6b44541015 Convert the CodeGen/ARM/sched-it-debug-nodes.ll testcase from IR -> MIR.
NFC
PR24563

llvm-svn: 256187
2015-12-21 19:44:42 +00:00
Adrian Prantl 5d9acc2443 Teach ARMLoadStoreOptimizer to ignore DBG_VALUE instructions when merging
instructions.

As noted in PR24563.
rdar://problem/23963293

llvm-svn: 256183
2015-12-21 19:25:03 +00:00
Matthew Simpson 11c4de6054 [AArch64] Add additional extract-extend patterns for smov
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.

This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.

llvm-svn: 256176
2015-12-21 18:31:25 +00:00
Jun Bum Lim 4bb171c8da Revert "[AArch64] Promote loads from stores"
This reverts commit r256004 due to a failure in cortex-a53.

llvm-svn: 256160
2015-12-21 15:36:49 +00:00
Chad Rosier d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Igor Breger 44b60a3687 AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored.
llvm-svn: 256157
2015-12-21 14:40:36 +00:00
Amjad Aboud 60b5e1b6c0 Implemented Support of IA interrupt and exception handlers:
http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html

Differential Revision: http://reviews.llvm.org/D15567

llvm-svn: 256155
2015-12-21 14:07:14 +00:00
Craig Topper 074e845260 [X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift.
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.

Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.

llvm-svn: 256126
2015-12-20 18:41:54 +00:00
Weiming Zhao 613c6862fa Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.

r250697: http://reviews.llvm.org/rL250697

Reviewers: rmaprath

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15653

llvm-svn: 256115
2015-12-20 06:41:44 +00:00
JF Bastien 374ea4bda5 WebAssembly: add vtable test
The test will mainly be useful to check that the .s file assembles and relocates properly because vtables reference functions in their data section.

llvm-svn: 256102
2015-12-19 18:55:18 +00:00
Keno Fischer f7346e0a6c Hopefully fix debug-info-blocks.ll test on win32 bot
llc_dwarf adds an mtriple, which forces this to use COFF, causing
the test to fail. Hopefully using regular llc without the triple
will work fine everywhere

llvm-svn: 256084
2015-12-19 03:32:23 +00:00
Keno Fischer 00cbf9a69a Clean up the processing of dbg.value in various places
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.

Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.

Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.

Reviewers: aprantl

Subscribers: dsanders

Differential Revision: http://reviews.llvm.org/D14186

llvm-svn: 256077
2015-12-19 02:02:44 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Matt Arsenault 10a509292c Fix broken type legalization of min/max
This was using an anyext when promoting the type
when zext/sext is required.

llvm-svn: 256074
2015-12-19 01:39:48 +00:00
Nicolai Haehnle dd58705af6 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Rafael Espindola 708a91a103 Revert "Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst"
This reverts commit r256028.

It broke:

    LLVM :: CodeGen/Mips/eh.ll
    LLVM :: CodeGen/Mips/insn-zero-size-bb.ll

llvm-svn: 256032
2015-12-18 21:23:32 +00:00
Jun Bum Lim 51a247065e Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256028
2015-12-18 20:53:47 +00:00
Krzysztof Parzyszek 21dc8bdd9e [Hexagon] Add PIC support
llvm-svn: 256025
2015-12-18 20:19:30 +00:00
Jun Bum Lim 3509d64c24 [AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256004
2015-12-18 18:08:30 +00:00
Hans Wennborg a6a2e512cf [X86] Use push-pop for materializing small constants under 'minsize'
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.

This is a follow-up to r255656.

Differential Revision: http://reviews.llvm.org/D15549

llvm-svn: 255936
2015-12-17 23:18:39 +00:00
Matthew Simpson 13dddb0799 Revert "[AArch64] Add DAG combine for extract extend pattern"
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.

llvm-svn: 255928
2015-12-17 21:29:47 +00:00
Dan Gohman 670a60ed52 [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.
llvm-svn: 255925
2015-12-17 20:50:45 +00:00
Tom Stellard caaa3aa07c AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15583

Patch by: Changpeng Fang

llvm-svn: 255908
2015-12-17 17:05:09 +00:00
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Alexey Bataev 7b72b658cc [X86] Add option for enabling LEA optimization pass, by Andrey Turetsky
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573

llvm-svn: 255881
2015-12-17 07:34:39 +00:00
Cong Hou b9e8d483b5 Fix PR25838.
This is a quick fix to PR25838. The issue comes from the restriction that we
cannot normalize probabilities containing both known and unknown ones. A patch
that removes this restriction is under the review now:

http://reviews.llvm.org/D15548

llvm-svn: 255867
2015-12-17 01:29:08 +00:00
Dan Gohman 4172953813 [WebAssembly] Fix legalization of shift operators on large integer types.
llvm-svn: 255847
2015-12-16 23:25:51 +00:00
Derek Schuff 8bb5f2927a [WebAssembly] Implement eliminateCallFramePseudo
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).

Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.

Differential Revision: http://reviews.llvm.org/D15587

llvm-svn: 255844
2015-12-16 23:21:30 +00:00
Eric Christopher bfba572425 Fix funciton->function typo.
llvm-svn: 255841
2015-12-16 23:10:53 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Derek Schuff 45cd5a79b2 [WebAssembly] Print an extra local decl when the user stack pointer is used
Differential Revision: http://reviews.llvm.org/D15546

llvm-svn: 255815
2015-12-16 20:43:06 +00:00
Dan Gohman b3aa1ecab0 [WebAssembly] Fix the CFG Stackifier to handle unoptimized branches
If a branch both branches to and falls through to the same block, treat it as
an explicit branch.

llvm-svn: 255803
2015-12-16 19:06:41 +00:00
Dan Gohman e2831b4e27 [WebAssembly] Use the new offset syntax for memory operands in inline asm.
llvm-svn: 255788
2015-12-16 18:14:49 +00:00
Ulrich Weigand 47f3649374 [SystemZ] Fix assertion failure in adjustSubwordCmp
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used.  This is
why the code in adjustSubwordCmp had this assertion:

    assert(C.ICmpType == SystemZICMP::Any &&
           "Signedness shouldn't matter here.");

assuming the the caller had already detected that fact.  However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.

Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.

llvm-svn: 255786
2015-12-16 18:04:06 +00:00
Tobias Edler von Koch b51460cf86 [Hexagon] Make memcpy lowering thread-safe
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.

llvm-svn: 255785
2015-12-16 17:29:37 +00:00
Dan Gohman 30a42bf585 [WebAssembly] Support more kinds of inline asm operands
llvm-svn: 255782
2015-12-16 17:15:17 +00:00
Michael Kuperstein e75e6e2a23 [X86] Improve shift combining
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into       (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into    (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])

sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.

This fixes PR24373.

Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161

llvm-svn: 255761
2015-12-16 11:22:37 +00:00
Chen Li e2222fdf95 Remove FileCheck from test case token_landingpad.ll.
The test case only needs to make sure it does not crash LLVM.

llvm-svn: 255755
2015-12-16 06:27:09 +00:00
Chen Li 3e0da9de77 Fixed test case in rL255749: [SelectionDAGBuilder] Adds support for landingpads of token type.
llvm-svn: 255750
2015-12-16 05:05:18 +00:00
Chen Li 3e8330a1fe [SelectionDAGBuilder] Adds support for landingpads of token type
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.

Reviewers: JosephTremoulet, majnemer, rnk

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15405

llvm-svn: 255749
2015-12-16 04:48:42 +00:00
Philip Reames 61a24ab6cc [IR] Add support for floating pointer atomic loads and stores
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Differential Revision: http://reviews.llvm.org/D15471

llvm-svn: 255737
2015-12-16 00:49:36 +00:00
Reid Kleckner 7850c9f5ca [WinEH] Make llvm.x86.seh.recoverfp work on x64
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.

Fixes SEH filter captures, which were broken in r250088.

Issue reported by Alex Crichton.

llvm-svn: 255707
2015-12-15 23:40:58 +00:00
Tom Stellard 7750f4ed9e AMDGPU/SI: Set the code object work group segment size when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15493

llvm-svn: 255702
2015-12-15 23:15:25 +00:00
Sanjay Patel 271efcdf20 [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend 
already has in PerformSELECTCombine().

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the 
vector.

Differential Revision: http://reviews.llvm.org/D15294

llvm-svn: 255700
2015-12-15 23:11:43 +00:00
Tom Stellard a495307e5e AMDGPU/SI: Set the code objects private segment size when targeting HSA.
Summary: I'm not sure how things worked before without this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15492

llvm-svn: 255692
2015-12-15 22:55:30 +00:00
Tom Stellard 29dd05e92f AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15426

llvm-svn: 255689
2015-12-15 22:39:36 +00:00
Dan Gohman 4b9d7916ee [WebAssembly] Implement instruction selection for constant offsets in addresses.
Add instruction patterns for matching load and store instructions with constant
offsets in addresses. The code is fairly redundant due to the need to replicate
everything between imm, tglobaldadr, and texternalsym, but this appears to be
common tablegen practice. The main alternative appears to be to introduce
matching functions with C++ code, but sticking with purely generated matchers
seems better for now.

Also note that this doesn't yet support offsets from getelementptr, which will
be the most common case; that will depend on a change in target-independent code
in order to set the NoUnsignedWrap flag, which I'll submit separately. Until
then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add.

Also implement isLegalAddressingMode with an approximation of this.

Differential Revision: http://reviews.llvm.org/D15538

llvm-svn: 255681
2015-12-15 22:01:29 +00:00
David Majnemer 3bb88c0210 [WinEH] Use operand bundles to describe call sites
SimplifyCFG allows tail merging with code which terminates in
unreachable which, in turn, makes it possible for an invoke to end up in
a funclet which it was not originally part of.

Using operand bundles on invokes allows us to determine whether or not
an invoke was part of a funclet in the source program.

Furthermore, it allows us to unambiguously answer questions about the
legality of inlining into call sites which the personality may have
trouble with.

Differential Revision: http://reviews.llvm.org/D15517

llvm-svn: 255674
2015-12-15 21:27:27 +00:00
Tom Stellard a6f24c6565 AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672
2015-12-15 20:55:55 +00:00
James Y Knight 33beb24318 [Sparc] Fix handling of double incoming arguments on sparc little-endian.
On SparcV8, doubles get passed in two 32-bit integer registers. The call
code was already handling endianness correctly, but the incoming
argument code was not -- it got the two halves in opposite order.

Also remove some dead code in LowerFormalArguments_32 to handle
less-than-32bit values, which can't actually happen.

Finally, add some test cases for the 32-bit calling convention, cribbed
from the 64abi.ll test, and run for both big and little-endian.

llvm-svn: 255668
2015-12-15 19:23:12 +00:00
Michael Kuperstein 53946bf8c6 [X86] MOVPC32r should only emit CFI adjustments when needed
We only want to emit CFI adjustments when actually using DWARF.
This fixes PR25828.

Differential Revision: http://reviews.llvm.org/D15522

llvm-svn: 255664
2015-12-15 18:50:32 +00:00
Hans Wennborg 08d5905bac [X86] Smaller code for materializing 32-bit 1 and -1 constants
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.

Differential Revision: http://reviews.llvm.org/D14971

llvm-svn: 255656
2015-12-15 17:10:28 +00:00
Krzysztof Parzyszek 372bd80834 [Hexagon] Preprocess mapped instructions before lowering to MC
llvm-svn: 255653
2015-12-15 17:05:45 +00:00
Tom Stellard 43f52df0b5 AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15475

llvm-svn: 255652
2015-12-15 17:02:52 +00:00
Tom Stellard ad7d03daa6 AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15474

llvm-svn: 255651
2015-12-15 17:02:49 +00:00
Nemanja Ivanovic 8922476bcb Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649
2015-12-15 14:50:34 +00:00
Michael Kuperstein 801ee74167 Do not try to use i8 and i16 versions of FP_TO_U/SINT soft float library calls
It appears that neither compiler-rt nor the gnu soft-float libraries actually
implement these conversions. Instead of emitting calls to library functions
that don't exist, handle it similarly to the way we handle i8 -> float and
i16 -> float conversions: call the i32 library function, and adjust the type.

Differential Revision: http://reviews.llvm.org/D15151

llvm-svn: 255643
2015-12-15 12:55:50 +00:00
Cong Hou 3ba9cf6020 Improve the successor list update in TailDuplication.cpp.
This patch improves a temporary fix in r255530 so that we can normalize
successor list without trigger assertion failures in tail duplication pass.

llvm-svn: 255638
2015-12-15 10:10:40 +00:00
Elena Demikhovsky 6015f5c823 Type legalizer for masked gather and scatter intrinsics.
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).

This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.

Differential Revision: http://reviews.llvm.org/D13633

llvm-svn: 255629
2015-12-15 08:40:41 +00:00
Quentin Colombet b82786e0ff [ShrinkWrapping] Do not choose restore point inside loops.
The post-dominance property is not sufficient to guarantee that a restore point
inside a loop is safe.
E.g.,
 while(1) {
   Save
   Restore
   if (...)
     break;
   use/def CSRs
 }
All the uses/defs of CSRs are dominated by Save and post-dominated
by Restore. However, the CSRs uses are still reachable after
Restore and before Save are executed.

This fixes PR25824

llvm-svn: 255613
2015-12-15 03:28:11 +00:00
Dan Gohman dcba338188 [WebAssembly] Remove .import printing.
For now, LLVM doesn't know about wasm module imports, so it shouldn't
emit .import directives.

llvm-svn: 255602
2015-12-15 02:20:44 +00:00
JF Bastien 65f0a71f40 WebAssembly: test global array indexing
This case was tested in the linker from code, but not from globals indexing into other globals. The linker currently barfs on this, ncbray volunteered to fix it.

llvm-svn: 255601
2015-12-15 02:02:51 +00:00
Dan Gohman c7c0445443 [WebAssembly] Add type prefixes to call instructions
Add return type information to call and call_indirect instructions. This
allows them to be disambiguated without knowledge of the callee.

Differential Revision: http://reviews.llvm.org/D15484

llvm-svn: 255565
2015-12-14 22:56:51 +00:00
Dan Gohman 8fe7e86bf5 [WebAssembly] Implement a new algorithm for placing BLOCK markers
Implement a new BLOCK scope placement algorithm which better handles
early-return blocks and early exists from nested scopes.

Differential Revision: http://reviews.llvm.org/D15368

llvm-svn: 255564
2015-12-14 22:51:54 +00:00
Chih-Hung Hsieh 7993e18e80 [X86] Part 2 to fix x86-64 fp128 calling convention.
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
  Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
  When target has MMX registers, configure MVT::f128 in FR128RegClass,
  with TypeSoftenFloat action, and custom actions for some opcodes.
  Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
  Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
  Fix infinite loop when f128 type can have
  VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.

Differential Revision: http://reviews.llvm.org/D11438

llvm-svn: 255558
2015-12-14 22:08:36 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
Paul Robinson accc3e0376 FastISel needs to remove dead code when it bails out.
When FastISel fails to translate an instruction it hands off code
generation to SelectionDAG. Before it does so, it may have generated
local value instructions to feed phi nodes in successor blocks. These
instructions will then be generated again by SelectionDAG, causing
duplication and less efficient code, including extra spill
instructions.

Patch by Wolfgang Pieb!

Differential Revision: http://reviews.llvm.org/D11768

llvm-svn: 255520
2015-12-14 18:33:18 +00:00
Petar Jovanovic 280f7101e8 [Power PC] llvm soft float support for ppc32
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D13700

llvm-svn: 255516
2015-12-14 17:57:33 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Matt Arsenault 52a52a564b AMDGPU: Fix splitting vector loads with existing offsets
If the original MMO had an offset, it was dropped.
Also use the correct alignment after adding the new offset.

llvm-svn: 255508
2015-12-14 16:59:40 +00:00
Saleem Abdulrasool 778c268594 ARM: only emit EABI attributes on EABI targets
EABI attributes should only be emitted on EABI targets.  This prevents the
emission of the optimization goals EABI attribute on Windows ARM.

llvm-svn: 255448
2015-12-13 05:27:45 +00:00
Simon Pilgrim 052191dd82 [X86][AVX512] Added support for VMOVQ shuffle comments
llvm-svn: 255442
2015-12-12 21:46:23 +00:00
Manuel Jacob 1578ec8860 Partially fix memcpy / memset / memmove lowering in SelectionDAG construction if address space != 0.
Summary:
Previously SelectionDAGBuilder asserted that the pointer operands of
memcpy / memset / memmove intrinsics are in address space < 256.  This assert
implicitly assumed the X86 backend, where all address spaces < 256 are
equivalent to address space 0 from the code generator's point of view.  On some
targets (R600 and NVPTX) several address spaces < 256 have a target-defined
meaning, so this assert made little sense for these targets.

This patch removes this wrong assertion and adds extra checks before lowering
these intrinsics to library calls.  If a pointer operand can't be casted to
address space 0 without changing semantics, a fatal error is reported to the
user.

The new behavior should be valid for all targets that give address spaces != 0
a target-specified meaning (NVPTX, R600, X86).  NVPTX lowers big or
variable-sized memory intrinsics before SelectionDAG construction.  All other
memory intrinsics are inlined (the threshold is set very high for this target).
R600 doesn't support memcpy / memset / memmove library calls (previously the
illegal emission of a call to such library function triggered an error
somewhere in the code generator).  X86 now emits inline loads and stores for
address spaces 256 and 257 up to the same threshold that is used for address
space 0 and reports a fatal error otherwise.

I call this a "partial fix" because there are still cases that can't be
lowered.  A fatal error is reported in these cases.

Reviewers: arsenm, theraven, compnerd, hfinkel

Subscribers: hfinkel, llvm-commits, alex

Differential Revision: http://reviews.llvm.org/D7241

llvm-svn: 255441
2015-12-12 21:33:31 +00:00
Simon Pilgrim a2d1591876 [X86][AVX] Tests tidyup
Cleanup/regenerate some tests for some upcoming patches.

llvm-svn: 255432
2015-12-12 12:52:52 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Chen Li 1b26b9ec9d [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255415
2015-12-12 01:04:15 +00:00
Hal Finkel 4d3da9c29b Fix test/CodeGen/PowerPC/ppc-shrink-wrapping.ll after r255398
llvm-svn: 255414
2015-12-12 00:42:05 +00:00
Hal Finkel 65539e3c94 [PowerPC] Add Branch Hints for Highly-Biased Branches
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.

Patch by Tom Jablin!

llvm-svn: 255398
2015-12-12 00:32:00 +00:00
Chen Li 02ef2e1385 Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
because it broke buildbot.

llvm-svn: 255395
2015-12-12 00:08:37 +00:00
Derek Schuff 9769debf88 [WebAssembly] Implement prolog/epilog insertion and FrameIndex elimination
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.

Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).

Differential Revision: http://reviews.llvm.org/D15344

llvm-svn: 255392
2015-12-11 23:49:46 +00:00
Chen Li e8f9387e0c [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255391
2015-12-11 23:39:32 +00:00
Matt Arsenault fabab4b7dd SelectionDAG: Match min/max if the scalar operation is legal
llvm-svn: 255388
2015-12-11 23:16:47 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Matthias Braun 60d69e2865 CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.

This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).

Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.

This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033

Differential Revision: http://reviews.llvm.org/D15320

llvm-svn: 255362
2015-12-11 19:42:09 +00:00
Kyle Butt 1452b76f1f [PPC]: Peephole optimize small accesss to aligned globals.
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
        addis 3, 2, b4v@toc@ha
        addi 4, 3, b4v@toc@l
        lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
        lbz 6, 1(4)         ; optimizer
        lbz 7, 2(4)
        lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
        addis 3, 2, b4v@toc@ha
        lbz 4, b4v@toc@l(3)
        lbz 5, b4v@toc@l+1(3)
        lbz 6, b4v@toc@l+2(3)
        lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.

llvm-svn: 255319
2015-12-11 00:47:36 +00:00
Eric Christopher 325e8d06dc Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,
x)) combines for ppc_fp128, since signbit computation is more
complicated.

Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html

Patch by Tim Shen!

llvm-svn: 255305
2015-12-10 22:09:06 +00:00
Kyle Butt 28b01a51b3 PPC: Teach FMA mutate to respect register classes.
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.

llvm-svn: 255299
2015-12-10 21:28:40 +00:00
Simon Pilgrim 06ea4be281 [DAGCombiner] Fix PR25763 - vector comparison constant folding + sign-extension
PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code.

llvm-svn: 255289
2015-12-10 19:47:06 +00:00
Pirama Arumuga Nainar 1317d5f311 Fix fptosi, fptoui from f16 vectors to i8, i16 vectors
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.

Add tests for v4f16, v8f16.

Reviewers: ab, jmolloy

Subscribers: llvm-commits, srhines

Differential Revision: http://reviews.llvm.org/D14936

llvm-svn: 255263
2015-12-10 17:16:49 +00:00
Dan Gohman 28818d7840 [WebAssembly] Tighten up several CHECK tests.
llvm-svn: 255255
2015-12-10 14:52:34 +00:00
Nemanja Ivanovic ac8d01add0 Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.

llvm-svn: 255246
2015-12-10 13:35:28 +00:00
Dan Gohman f170ba08af [WebAssembly] Implement mixed-type ISD::FCOPYSIGN.
ISD::FCOPYSIGN permits its operands to have differing types, and DAGCombiner
uses this. Add some def : Pat rules to expand this out into an explicit
conversion and a normal copysign operation.

llvm-svn: 255220
2015-12-10 04:55:31 +00:00
Dan Gohman 9341c1d4b3 [WebAssembly] Implement fma.
It is lowered to a libcall for now, but this is expected to change in the future.

llvm-svn: 255219
2015-12-10 04:52:33 +00:00
Tom Stellard c93fc11f36 AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Tom Stellard b3c3bda512 AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints
Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs.

Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15342

llvm-svn: 255203
2015-12-10 02:12:53 +00:00
Dan Gohman 60bddf17c5 [WebAssembly] Fix legalization of f32->f64 EXTLOAD.
llvm-svn: 255202
2015-12-10 02:07:53 +00:00
Dan Gohman a5603b835b [WebAssembly] Also legalize sign_extend_inreg of i32->i64.
llvm-svn: 255191
2015-12-10 01:00:19 +00:00
Dan Gohman dab313e0ed PeepholeOptimizer: Ignore dead implicit defs
Target-specific instructions may have uninteresting physreg clobbers,
for target-specific reasons. The peephole pass doesn't need to concern
itself with such defs, as long as they're implicit and marked as dead.

llvm-svn: 255182
2015-12-10 00:37:51 +00:00
Dan Gohman a8483755d3 [WebAssembly] Fix legalization of shift operators with illegal types.
llvm-svn: 255181
2015-12-10 00:26:26 +00:00
Dan Gohman df00a9ebc2 [WebAssembly] Implement anyext.
llvm-svn: 255179
2015-12-10 00:17:35 +00:00
Quentin Colombet 5d2f7cfd44 [X86] Enable shrink-wrapping by default, but keep it disabled for stack frames
without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.

llvm-svn: 255175
2015-12-09 23:08:18 +00:00
Dan Gohman 1cf96c0c34 [WebAssembly] Reintroduce ARGUMENT moving logic
Reinteroduce the code for moving ARGUMENTS back to the top of the basic block.
While the ARGUMENTS physical register prevents sinking and scheduling from
moving them, it does not appear to be sufficient to prevent SelectionDAG from
moving them down in the initial schedule. This patch introduces a patch that
moves them back to the top immediately after SelectionDAG runs.

This is still hopefully a temporary solution. http://reviews.llvm.org/D14750 is
one alternative, though the review has not been favorable, and proposed
alternatives are longer-term and have other downsides.

This fixes the main outstanding -verify-machineinstrs failures, so it adds
-verify-machineinstrs to several tests.

Differential Revision: http://reviews.llvm.org/D15377

llvm-svn: 255125
2015-12-09 16:23:59 +00:00
Tim Northover d91d635b36 ARM: don't use a deleted node as the BaseReg in complex pattern.
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.

Should fix PR25733

llvm-svn: 255120
2015-12-09 15:54:50 +00:00
Robert Lougher f0033b29d4 Fix cycle in selection DAG introduced by extractelement legalization
During selection DAG legalization, extractelement is replaced with a load
instruction.  To do this, a temporary store to the stack is used unless an
existing store is found that can be re-used.
    
If re-using a store, the chain going out of the store must be replaced by
the one going out of the new load (this ensures that any stores that must
take place after the store happens after the load, else the value might
be overwritten before it is loaded).
    
The problem is, if the extractelement index is dependent on the store
replacing the chain will introduce a cycle in the selection DAG (the load
uses the index, and by replacing the chain we will make the index dependent
on the load).
    
To fix this, if the index is dependent on the store, the store is skipped.
This is conservative as we may end up creating an unnecessary extra store
to the stack.  However, the situation is not expected to occur very often.

Differential Revision: http://reviews.llvm.org/D15330

llvm-svn: 255114
2015-12-09 14:34:10 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Vyacheslav Klochkov a3cd08b05c X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Reviewer: Simon Pilgrim.
Differential Revision: http://reviews.llvm.org/D15317

llvm-svn: 255080
2015-12-09 00:12:13 +00:00
Pirama Arumuga Nainar e6ccd7b66a Define selection for v4f16, v8f16 scalar_to_vector
Summary:
This fixes failure when trying to select
    insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.

The accompanying v4 and v8 tests fail instruction selection without this
patch.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15322

llvm-svn: 255072
2015-12-08 23:07:06 +00:00
Simon Pilgrim 323e00d9c7 [X86][AVX] Fold loads + splats into broadcast instructions
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.

This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.

Fix for PR23022

Differential Revision: http://reviews.llvm.org/D15310

llvm-svn: 255061
2015-12-08 22:17:11 +00:00
Simon Pilgrim 0aea1b89eb [X86][SSE4A] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse4a-builtins.c

llvm-svn: 255053
2015-12-08 21:43:41 +00:00
Simon Pilgrim 0ca7cb6334 [X86][SSSE3] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/ssse3-builtins.c

llvm-svn: 255052
2015-12-08 21:32:08 +00:00
Simon Pilgrim 9d76810949 [X86][SSE3] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse3-builtins.c

llvm-svn: 255051
2015-12-08 21:27:19 +00:00
Artyom Skrobov 0a37b80bcb Fix ARMv4T (Thumb1) epilogue generation
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.

Reviewers: jroelofs, qcolombet

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15126

llvm-svn: 255047
2015-12-08 19:59:01 +00:00
Ron Lieberman e6540e244a [Hexagon] Add NewValueJump support for C4_cmpneq, C4_cmplte, C4_cmplteu
llvm-svn: 255027
2015-12-08 16:28:32 +00:00
Justin Bogner 0ebc8605ad IR: Allow vectors of halfs to be ConstantDataVectors
Currently, vectors of halfs end up as ConstantVectors, but there isn't
a good reason they can't be ConstantDataVectors. This should save some
memory.

llvm-svn: 254991
2015-12-08 03:01:16 +00:00
Justin Bogner 3135ba9b38 AsmPrinter: Use emitGlobalConstantFP to emit elements of constant data
It's strange to duplicate the logic for emitting FP values into
emitGlobalConstantDataSequential, and it's even stranger that we end
up printing the verbose assembly comments differently between the two
paths. Just call into emitGlobalConstantFP rather than crudely
duplicating its logic.

llvm-svn: 254988
2015-12-08 02:37:48 +00:00
Manman Ren cb8470b4b5 [CXX TLS calling convention] Add support for AArch64.
rdar://9001553

llvm-svn: 254978
2015-12-08 00:14:38 +00:00
Kit Barton a1c712fae5 [PPC64] Convert bool literals to i32
Convert i1 values to i32 values if they should be allocated in GPRs instead of CRs.

Phabricator: http://reviews.llvm.org/D14064
llvm-svn: 254942
2015-12-07 20:50:29 +00:00
Simon Pilgrim 69aa463780 Fix line endings
llvm-svn: 254939
2015-12-07 20:36:00 +00:00
Ron Lieberman c5e20a41a0 [Hexagon] Adding v60 test, vasr in particular.
llvm-svn: 254923
2015-12-07 18:52:39 +00:00
Sanjay Patel fe2e9121e2 Tighten checks so we can see existing codegen
The 2-element vector case shows a surprising bug: we failed to
eliminate ops on undefs, so there are 4 fmax calls even though
there can only be 2 valid elements in the inputs.

llvm-svn: 254920
2015-12-07 17:39:48 +00:00
Elena Demikhovsky 291fe0159f VX-512: Fixed a bug in FP logic operation lowering
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.

Differential Revision: http://reviews.llvm.org/D15110

llvm-svn: 254913
2015-12-07 14:33:34 +00:00
Artyom Skrobov e9b3fb8603 [ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM.
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15236

llvm-svn: 254912
2015-12-07 14:22:39 +00:00
Elena Demikhovsky 33e61eceb4 AVX-512: Fixed masked load / store instruction selection for KNL.
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.

This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.

All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.

Differential Revision: http://reviews.llvm.org/D15265

llvm-svn: 254909
2015-12-07 13:39:24 +00:00
Igor Breger 3ab6f17530 AVX-512: implement kunpck intrinsics.
Differential Revision: http://reviews.llvm.org/D14821

llvm-svn: 254908
2015-12-07 13:25:18 +00:00
Bradley Smith d5a1f47a63 [ARM] Flag vcvt{t,b} with an f16 type specifier as part of the FP16 extension
Additionally correct the Cortex-R7 definition to allow the FP16 feature.

llvm-svn: 254900
2015-12-07 10:54:36 +00:00
Simon Pilgrim 12301b0814 [X86][AVX] Added tests to load+broadcast non-zero'th vector elements
Baseline for an upcoming patch for PR23022

llvm-svn: 254898
2015-12-07 09:09:54 +00:00
Keno Fischer 0ef8ccf968 [Verifier] Fix !dbg validation if Scope is the Subprogram
Summary:
We are inserting both Scope and SP into the Seen map and check whether
it was already there in which case we skip the validation (the idea
being that we already checked this Subprogram before). However,
if (Scope == SP) as MDNodes, then inserting the Scope, will trigger
the Seen check causing us to incorrectly not validate this !dbg
attachment. Fix this by not performing the SP Seen check if Scope == SP

Reviewers: pcc, dexonsmith, dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D14697

llvm-svn: 254887
2015-12-06 23:05:38 +00:00
Simon Pilgrim 38e93ea1cc [X86][AVX] Tidied up BROADCASTPD/BROADCASTPS tests
Regenerate tests using update_llc_test_checks.py

llvm-svn: 254886
2015-12-06 20:12:19 +00:00
Dan Gohman a4b710a74f [WebAssembly] Enable folding of offsets into global variable addresses.
llvm-svn: 254882
2015-12-06 19:33:32 +00:00
Dan Gohman 6ddce716cb [WebAssembly] Tighten up some testcase regular expressions.
llvm-svn: 254881
2015-12-06 19:31:44 +00:00
Sanjay Patel 6226e6d993 [x86] add missing maxnum/minnum tests for 256-bit vectors
Also, switch to x86-64 because once we can lower these to something
more reasonable, there will be less noise in the checks. And add
AVX runs because those will be different than SSE.

llvm-svn: 254879
2015-12-06 18:05:12 +00:00
Asaf Badouh 41ecf460fa [X86][AVX512] add vmovss/sd missing encoding
Differential Revision: http://reviews.llvm.org/D14701

llvm-svn: 254875
2015-12-06 13:26:56 +00:00
Michael Kuperstein 77ce9d3b1a [X86] Always generate precise CFA adjustments.
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.

Differential Revision: http://reviews.llvm.org/D14948

llvm-svn: 254874
2015-12-06 13:06:20 +00:00
Igor Breger 076dfe5c12 AVX512: support AVX512BW Intrinsic in 32bit mode.
Differential Revision: http://reviews.llvm.org/D15076

llvm-svn: 254873
2015-12-06 11:35:18 +00:00
Dan Gohman d85c3b1fbc [WebAssembly] Don't perform the returned-argument optimization on constants.
llvm-svn: 254866
2015-12-05 22:12:39 +00:00
Dan Gohman e2a7a8278f [WebAssembly] Implement direct calls to external symbols.
llvm-svn: 254863
2015-12-05 20:41:36 +00:00
Sanjay Patel f413410f55 Add vector fmaxnum tests that correspond to the existing fminnum tests
Note: missing 256-bit tests for min and max should also be added.
llvm-svn: 254862
2015-12-05 20:27:10 +00:00
Dan Gohman 284384b640 [WebAssembly] Support inline asm constraints of type i16 and similar.
llvm-svn: 254861
2015-12-05 20:03:44 +00:00
Sanjay Patel 1c7692b881 fix typo; NFC
llvm-svn: 254860
2015-12-05 19:54:59 +00:00
Simon Pilgrim 4ba5969224 [X86][ADX] Added memory folding patterns and stack folding tests
llvm-svn: 254844
2015-12-05 07:27:50 +00:00
Simon Pilgrim 5a64d98303 [X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions
Both were defaulting to the float domain - now matches the packed instructions.

llvm-svn: 254841
2015-12-05 07:07:42 +00:00
Cong Hou 833fe143f5 Normalize successors' probabilities when building MBBs for jump table.
llvm-svn: 254837
2015-12-05 05:00:55 +00:00
Dan Gohman f0b165a7f8 [WebAssembly] Implement ReverseBranchCondition, and re-enable MachineBlockPlacement
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.

Differential Revision: http://reviews.llvm.org/D14995

llvm-svn: 254826
2015-12-05 03:03:35 +00:00
Dan Gohman 4da4abd87f [WebAssembly] Fix scheduling dependencies in register-stackified code
Add physical register defs to instructions used from stackified
instructions to prevent them from being scheduled into the middle of
a stack sequence. This is a conservative measure which may be loosened
in the future.

Differential Revision: http://reviews.llvm.org/D15252

llvm-svn: 254811
2015-12-05 00:51:40 +00:00
Derek Schuff 9d77952332 [WebAssembly] Support constant offsets on loads and stores
This is just prototype for load/store for i32 types. I'll add them to
the rest of the types if we like this direction.

Differential Revision: http://reviews.llvm.org/D15197

llvm-svn: 254807
2015-12-05 00:26:39 +00:00
Dan Gohman 35bfb24c28 [WebAssembly] Initial varargs support.
Full varargs support will depend on prologue/epilogue support, but this patch
gets us started with most of the basic infrastructure.

Differential Revision: http://reviews.llvm.org/D15231

llvm-svn: 254799
2015-12-04 23:22:35 +00:00
Hans Wennborg 5000ce8a63 X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.

(GCC puts these instructions behind -msahf on 64-bit for the same reason.)

This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.

Differential Revision: http://reviews.llvm.org/D15240

llvm-svn: 254793
2015-12-04 23:00:33 +00:00
Chad Rosier f3491496dc [AArch64] Expand vector SDIVREM/UDIVREM operations.
http://reviews.llvm.org/D15214
Patch by Ana Pazos <apazos@codeaurora.org>!

llvm-svn: 254773
2015-12-04 21:38:44 +00:00
Manman Ren 19c7bbe3b7 [CXX TLS calling convention] Add CXX TLS calling convention.
This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.

The target-specific implementation for X86-64 is defined as following:
  Arguments are passed as for the default C calling convention
  The same applies for the return value(s)
  The callee preserves all GPRs - except RAX and RDI

The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.

rdar://9001553

llvm-svn: 254737
2015-12-04 17:40:13 +00:00
Alexey Bataev 7cf324772f LEA code size optimization pass (Part 1): Remove redundant address recalculations, by Andrey Turetsky
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294

llvm-svn: 254712
2015-12-04 10:53:15 +00:00
NAKAMURA Takumi a3561b388c Move llvm/test/CodeGen/Generic/function-alias.ll to X86. It is incompatible to PECOFF.
FIXME: It may be ELF-generic.
llvm-svn: 254685
2015-12-04 02:00:12 +00:00
Quentin Colombet 901f036353 [ARM] When a bitcast is about to be turned into a VMOVDRR, try to combine it
with its source instead of forcing the values on GPRs.

This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.

rdar://problem/23691584 

llvm-svn: 254684
2015-12-04 01:53:14 +00:00
Matthias Braun 97d0ffbe06 ScheduleDAGInstrs: Rework schedule graph builder.
Re-comitting with a change that avoids undefined uses getting put into
the VRegUses list.

The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
  (not used for now)

The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.

Differential Revision: http://reviews.llvm.org/D9068

llvm-svn: 254683
2015-12-04 01:51:19 +00:00
JF Bastien 580b6572b5 X86InstrInfo::copyPhysReg: workaround reg liveness
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.

See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.

This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.

Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15198

llvm-svn: 254680
2015-12-04 01:18:17 +00:00
Evgeniy Stepanov 7fc3cb5919 Fix function-alias.ll test on non-X86 targets.
llvm-svn: 254676
2015-12-04 00:57:25 +00:00
Evgeniy Stepanov 2bb9c5ca22 Emit function alias to data as a function symbol.
CFI emits jump slots for indirect functions as a byte array
constant, and declares function-typed aliases to these constants.

This change fixes AsmPrinter to emit these aliases as function
symbols and not data symbols.

llvm-svn: 254674
2015-12-04 00:45:43 +00:00
JF Bastien 1ac69947b6 CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:

  %vreg = COPY %PHYSREG
  ...
  %PHYSREG = COPY %vreg

There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.

This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.

Before this patch we have:
  DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
  %vreg1<def> = COPY %EFLAGS; GR64:%vreg1
  %EFLAGS<def> = COPY %vreg1; GR64:%vreg1
  JNE_1 <BB#1>, %EFLAGS<imp-use>

Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.

dec is especially confusing to LLVM when compared with sub.

I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.

The following tests used to failed when the patch also replaced allocatable
registers:
  CodeGen/X86/StackColoring.ll
  CodeGen/X86/avx512-calling-conv.ll
  CodeGen/X86/copy-propagation.ll
  CodeGen/X86/inline-asm-fpstack.ll
  CodeGen/X86/musttail-varargs.ll
  CodeGen/X86/pop-stack-cleanup.ll
  CodeGen/X86/preserve_mostcc64.ll
  CodeGen/X86/tailcallstack64.ll
  CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.

Note that all other backends' tests pass.

Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157

llvm-svn: 254665
2015-12-03 23:43:56 +00:00
Dan Gohman 391a98afd5 [WebAssembly] Fix dominance check for PHIs in the StoreResult pass
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.

Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.

Differential Revision: http://reviews.llvm.org/D15210

llvm-svn: 254662
2015-12-03 23:07:03 +00:00
Reid Kleckner 93fc520339 [X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.

This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.

Reviewers: qcolombet

Subscribers: hans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15200

llvm-svn: 254645
2015-12-03 20:46:59 +00:00
Andrew Kaylor 92b3b16ba3 Move branch folding test to a better location.
llvm-svn: 254640
2015-12-03 19:41:25 +00:00
Matthias Braun 0d4505c067 AArch64FastISel: Use cbz/cbnz to branch on i1
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.

Differential Revision: http://reviews.llvm.org/D15122

llvm-svn: 254621
2015-12-03 17:19:58 +00:00
Tom Stellard 9760f03757 AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent section
Summary: This is done only when targeting HSA.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13807

llvm-svn: 254587
2015-12-03 03:34:32 +00:00
Matthias Braun 2fd672a221 Revert "ScheduleDAGInstrs: Rework schedule graph builder."
This works mostly fine but breaks some stage 1 builders when compiling
compiler-rt on i386. Revert for further investigation as I can't see an
obvious cause/fix.

This reverts commit r254577.

llvm-svn: 254586
2015-12-03 03:01:10 +00:00
Matthias Braun d35fe3d984 ScheduleDAGInstrs: Rework schedule graph builder.
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
  (not used for now)

The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.

Differential Revision: http://reviews.llvm.org/D9068

llvm-svn: 254577
2015-12-03 02:05:27 +00:00
Derek Schuff 5268aaf7b6 [WebAssembly] Add a test for wasm-store-results pass
Differential Revision: http://reviews.llvm.org/D15167

llvm-svn: 254570
2015-12-03 00:50:30 +00:00
Kyle Butt 2f713eb438 Tests: PPC: remove unnecessary metadata. NFC
Remove unnecessary metadata from a test case.

llvm-svn: 254544
2015-12-02 21:08:03 +00:00
Tom Stellard 00f2f91af4 AMDGPU/SI: Correctly emit agent global segment variables when targeting HSA
Differential Revision: http://reviews.llvm.org/D14508

llvm-svn: 254540
2015-12-02 19:47:57 +00:00
Kyle Butt cf6a8bfe51 [CodeGen]: Fix bad interaction with AntiDep breaking and inline asm.
AggressiveAntiDepBreaker was renaming registers specified by the user
for inline assembly. While this will work for compiler-specified
registers, it won't work for user-specified registers, and at the time
this runs, I don't currently see a way to distinguish them.

llvm-svn: 254532
2015-12-02 18:58:51 +00:00
Tim Northover f520eff782 AArch64: use ldxp/stxp pair to implement 128-bit atomic loads.
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.

llvm-svn: 254524
2015-12-02 18:12:57 +00:00
Tom Stellard e3b5aeaf83 AMDGPU/SI: Don't emit group segment global variables
Summary: Only global or readonly segment variables should appear in object files.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15111

llvm-svn: 254519
2015-12-02 17:00:42 +00:00
Christof Douma 8b5dc2c94e [AArch64]: Add support for Cortex-A35
Adds support for the new Cortex-A35 ARMv8-A core.

llvm-svn: 254503
2015-12-02 11:53:44 +00:00
Nemanja Ivanovic 74e31bc929 Patch to fix a crash in the PowerPC back end due to ISD::ROTL and ISD::ROTR
not being expanded. Test case included.

llvm-svn: 254501
2015-12-02 10:36:24 +00:00
Simon Pilgrim 3fc3454a0c [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Asaf Badouh 2489f350c0 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 254493
2015-12-02 08:17:51 +00:00
Quentin Colombet f1e91c8bf1 [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This is a superset of the fix done in r254448.

This fixes PR25607.

llvm-svn: 254478
2015-12-02 01:22:54 +00:00
Tim Northover f3be9d5c0b AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).

The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.

rdar://22491037

llvm-svn: 254475
2015-12-02 00:33:54 +00:00
Matt Arsenault 592d068198 AMDGPU: Error on addrspacecasts that aren't actually implemented
llvm-svn: 254469
2015-12-01 23:04:05 +00:00
Matt Arsenault f9bfeafd00 AMDGPU: Implement isNoopAddrSpaceCast
llvm-svn: 254468
2015-12-01 23:04:00 +00:00
Quentin Colombet 9cb01aa30a [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.
This fixes PR25629.

llvm-svn: 254448
2015-12-01 19:49:31 +00:00
Artyom Skrobov 5d1f2524a0 Fix Thumb1 epilogue generation
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.

Reviewers: jroelofs, qcolombet

Subscribers: tyomitch, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14986

llvm-svn: 254444
2015-12-01 19:25:11 +00:00
Weiming Zhao 56ab51870c [AArch64] Fix a corner case in BitFeild select
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.

See https://llvm.org/bugs/show_bug.cgi?id=25571

We can just mark the operand as IMPLICIT_DEF is none bits of it is used.

Reviewers: t.p.northover, jmolloy

Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14803

llvm-svn: 254440
2015-12-01 19:17:49 +00:00
Elena Demikhovsky aa1f17ea95 AVX-512: regenerated test for avx512 arithmetics, NFC
llvm-svn: 254410
2015-12-01 12:35:03 +00:00
Yury Gribov d7dbb66eb8 Introduce new @llvm.get.dynamic.area.offset.i{32, 64} intrinsics.
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.

Patch by Max Ostapenko.

Differential Revision: http://reviews.llvm.org/D14983

llvm-svn: 254404
2015-12-01 11:40:55 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Simon Pilgrim db26b3ddfa [X86][FMA4] Prefer FMA4 to FMA
We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4).

This patch flips this so FMA4 is preferred; this is for several reasons:

1 - FMA4 is non-destructive reducing the need for mov instructions.
2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference).
3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions.

Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs.

Differential Revision: http://reviews.llvm.org/D14997

llvm-svn: 254339
2015-11-30 22:22:06 +00:00
Paul Robinson a2550a6da3 Have 'optnone' respect the -fast-isel=false option.
This is primarily useful for debugging optnone v. ISel issues.

Differential Revision: http://reviews.llvm.org/D14792

llvm-svn: 254335
2015-11-30 21:56:16 +00:00
Cong Hou eb9c7056f0 [X86] Update test/CodeGen/X86/avg.ll with the help of update_llc_test_checks.py. NFC.
llvm-svn: 254334
2015-11-30 21:46:08 +00:00
Matt Arsenault 26f8f3db39 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault 0e3d38937e AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329
2015-11-30 21:15:53 +00:00
Matt Arsenault ff6da2fe89 AMDGPU: Use assert zext for workgroup sizes
llvm-svn: 254328
2015-11-30 21:15:45 +00:00
Quentin Colombet cdad10f333 [ARM] For old thumb ISA like v4t, we cannot use PC directly in pop.
Fix the epilogue emission to account for that.

llvm-svn: 254325
2015-11-30 20:37:58 +00:00
David Majnemer bf4119faf6 [X86] Add RIP to GR64_TCW64
The MachineVerifier wants to check that the register operands of an
instruction belong to the instruction's register class.  RIP-relative
control flow instructions violated this by referencing RIP.  While this
was fixed for SysV, it was never fixed for Win64.

llvm-svn: 254315
2015-11-30 19:04:19 +00:00
Kit Barton f4ce2f3a9e Enable shrink wrapping for PPC64
Re-enable shrink wrapping for PPC64 Little Endian.

One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly.

Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found.

PHabricator: http://reviews.llvm.org/D14778
llvm-svn: 254314
2015-11-30 18:59:41 +00:00
Matt Arsenault ea03cf2fa1 AMDGPU: Don't reserve SCRATCH_PTR input register
This hasn't been doing anything since using relocations was added.

llvm-svn: 254304
2015-11-30 15:46:47 +00:00
Igor Breger ea7932cfb7 AVX512: regenerate avx512bw intrincics tests results.
Differential Revision: http://reviews.llvm.org/D15069

llvm-svn: 254295
2015-11-30 10:40:52 +00:00
Craig Topper ecae476e4c [X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices.
llvm-svn: 254269
2015-11-29 22:53:22 +00:00
Simon Pilgrim 88aa627c0b [X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.

llvm-svn: 254259
2015-11-29 16:41:04 +00:00
Simon Pilgrim 4c5ab52a54 [X86][AVX] Regenerate ADDSUB tests
Tidied up triple and regenerate tests using update_llc_test_checks.py

llvm-svn: 254237
2015-11-28 19:20:49 +00:00
Renato Golin 5dbc8a5283 Revert "[ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM."
This reverts commit r254201 and r254202, as it broke test-suite,
self-hosting and sanitizer tests on ARM buildbots.

llvm-svn: 254234
2015-11-28 17:23:46 +00:00
Simon Pilgrim d9bb73b236 [X86][FMA] Added 512-bit tests to match 128/256-bit tests coverage
As discussed on D14909

llvm-svn: 254233
2015-11-28 16:04:24 +00:00
Simon Pilgrim 82f663d755 [X86][FMA] More thorough FMA tests
Added FMADD/FMSUB/FNMADD/FNMSUB tests for all types

Added load folding tests for 512-bit vectors

NOTE: Many of the AVX512 FMA instructions don't yet commute/fold correctly

As discussed on D14909

llvm-svn: 254232
2015-11-28 14:28:44 +00:00
Simon Pilgrim 29412ee45f [X86][AVX2] Tidied up PBROADCAST tests
Tidied up triple and regenerate tests using update_llc_test_checks.py

llvm-svn: 254231
2015-11-28 14:15:40 +00:00
NAKAMURA Takumi 50024613e7 llvm/test/CodeGen/SystemZ/alloca-04.ll REQUIRES asserts due to -debug-pass.
llvm-svn: 254230
2015-11-28 13:05:49 +00:00
Jonas Paulsson f12b925bb1 [Stack realignment] Handling of aligned allocas.
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.

It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.

SystemZ benefits from this, due to its stack frame layout.

New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.

Review and help from Ulrich Weigand and Hal Finkel.

llvm-svn: 254227
2015-11-28 11:02:32 +00:00
Artyom Skrobov b955b90509 [ARM] Generate ABI_optimization_goals build attribute, as described in the ARM ARM.
Summary:
Since this build attribute corresponds to a whole module, and
different functions in a module may differ in the optimizations
enabled for them, this attribute is emitted after all functions,
and only in the case that the optimization goals for all
functions match.

Reviewers: logan, hans

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D14934

llvm-svn: 254201
2015-11-27 15:30:51 +00:00
Simon Pilgrim 1d881ae225 [X86][FMA] Begun adding AVX512 FMA tests
As discussed on D14909

llvm-svn: 254180
2015-11-26 20:53:28 +00:00
Krzysztof Parzyszek 4eb6d4d1f2 [Hexagon] Hexagon V60 HVX intrinsic defintions
Author: Ron Lieberman <ronl@codeaurora.org>
llvm-svn: 254165
2015-11-26 16:54:33 +00:00
Martell Malone d12292480a ARM: address WOA unsigned division overflow crash
Building on r253865 the crash is not limited to signed overflows.

Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.

llvm-svn: 254158
2015-11-26 15:34:03 +00:00
Daniel Sanders fbb6a237ba [mips][ias] Explicitly disable IAS on tests that depend on not assembling.
Summary:
no-odd-spreg-msa.ll: This test deliberately uses an odd-numbered register
in inline assembly and expects the compiler to insert a move to an
even-numbered register.

inlineasm-operand-code.ll and inlineasm_constraint.ll:
Checks for IAS's output will be added once a matcher bug is resolved. This bug
causes the canonical output emitted by IAS to be incorrect for uimm16 constants
with the MSB set. We will still need the non-IAS checks at this point since
these tests primarily test formatting of operands.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14705

llvm-svn: 254148
2015-11-26 11:23:03 +00:00
Daniel Sanders f2b5bff843 [mips][ias] Replace anchor comments with anchor instructions in tests.
Summary:
This is because IAS will delete the comments. NFC at the moment but it will
prevent a failure once IAS is the default.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D14704

llvm-svn: 254147
2015-11-26 10:26:18 +00:00
Vyacheslav Klochkov ed865dfcc5 X86-FMA3: Improved/enabled the memory folding optimization for scalar loads
generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated 
for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}().

Reviewer: David Kreitzer
Differential Revision: http://reviews.llvm.org/D14762

llvm-svn: 254140
2015-11-26 07:45:30 +00:00
Tom Stellard 48f29f21ee AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic
Summary:
This returns a pointer to the dispatch packet, which can be used to load
information about the kernel dispach.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14898

llvm-svn: 254116
2015-11-26 00:43:29 +00:00
Dan Gohman a774d719a0 [WebAssembly] Fix inline asm support for i64 operands.
llvm-svn: 254106
2015-11-25 22:28:50 +00:00
Dan Gohman d9b4218831 [WebAssembly] Fold setne and seteq comparisons into selects.
llvm-svn: 254104
2015-11-25 22:13:48 +00:00
Marek Olsak 7ed6b2f414 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

llvm-svn: 254095
2015-11-25 21:22:45 +00:00
Krzysztof Parzyszek 207c13f254 Add hexagonv55 and hexagonv60 as recognized CPUs, make v60 the default
llvm-svn: 254089
2015-11-25 20:30:59 +00:00
Matt Arsenault d179481857 AMDGPU: Add some tests for promotion of v2i64 scalar_to_vector
llvm-svn: 254087
2015-11-25 20:01:03 +00:00
Matt Arsenault 61001bbc03 AMDGPU: Make v2i64/v2f64 legal types.
They can be loaded and stored, so count them as legal. This is
mostly to fix a number of common cases for load/store merging.

llvm-svn: 254086
2015-11-25 19:58:34 +00:00
Dan Gohman fb3e0594e4 [WebAssembly] Use a physical register to describe ARGUMENT liveness.
Instead of trying to move ARGUMENT instructions back up to the top after
they've been scheduled or sunk down, use a fake physical register to
create a liveness constraint that prevents ARGUMENT instructions from
moving down in the first place. This is still not entirely ideal, however
it is more robust than letting them move and moving them back.

llvm-svn: 254084
2015-11-25 19:36:19 +00:00
Dan Gohman 1270b0a91d [WebAssembly] Make several tests more strict.
llvm-svn: 254077
2015-11-25 17:33:15 +00:00
Dan Gohman 81719f8555 [WebAssembly] Support for register stackifying with load and store instructions.
llvm-svn: 254076
2015-11-25 16:55:01 +00:00
Dan Gohman 2c8fe6a428 [WebAssembly] Codegen support for ISD::ExternalSymbol
llvm-svn: 254075
2015-11-25 16:44:29 +00:00
Hal Finkel 005f840959 [PowerPC] Don't generate mfocrf on the e500mc
The e500mc does not actually support the mfocrf instruction; update the
processor definitions to reflect that fact.

Patch by Tom Rix (with some test-case cleanup by me).

llvm-svn: 254064
2015-11-25 10:14:31 +00:00
Eric Christopher f83a2c2db2 Accept any stack offset, including none, here.
llvm-svn: 254062
2015-11-25 09:21:36 +00:00
Eric Christopher 4675c439aa Fix some places where we were assuming that memory type had been legalized
to a simple type when lowering a truncating store of a vector type. In this
case for an EVT we'll return Expand as we should in all of the cases anyhow.

The testcase triggered at the one in VectorLegalizer::LegalizeOp, inspection
found the rest.

llvm-svn: 254061
2015-11-25 09:11:53 +00:00
Simon Pilgrim c85c49c665 [X86][AVX] Regenerate Splat OptSize tests
Tidied up triple and regenerate tests using update_llc_test_checks.py

llvm-svn: 254060
2015-11-25 09:06:17 +00:00
Elena Demikhovsky f07df9fcac AVX-512: Fixed a bug in VPERMT2* intrinsic.
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.

Differential Revision: http://reviews.llvm.org/D14942

llvm-svn: 254059
2015-11-25 08:17:56 +00:00
Hans Wennborg e412b71f95 Revert r253528: "[X86] Enable shrink-wrapping by default."
This caused PR25607 and also caused Chromium to crash on start-up.

(Also had to update test/CodeGen/X86/avx-splat.ll, which was committed
after shrink wrapping was enabled.)

llvm-svn: 254044
2015-11-25 00:05:13 +00:00
Simon Pilgrim c1225c28e1 [X86][SSE] Regenerate PMUL tests
Tidied up triple and regenerate tests using update_llc_test_checks.py

llvm-svn: 254029
2015-11-24 22:09:31 +00:00
Simon Pilgrim 1b4fecb098 [X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.

Fix for PR24364

Differential Revision: http://reviews.llvm.org/D14906

llvm-svn: 254016
2015-11-24 20:31:46 +00:00
Cong Hou db6220f84d [X86] Fix several issues related to X86's psadbw instruction.
This patch fixes the following issues:

1. Fix the return type of X86psadbw: it should not be the same type of inputs.
   For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.


Differential revision: http://reviews.llvm.org/D14897

llvm-svn: 254010
2015-11-24 19:51:26 +00:00
Sanjay Patel a0d354541d [x86] remove duplicate movq instruction defs (PR25554)
We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE:

  def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
                     "mov{d|q}\t{$src, $dst|$dst, $src}", // X86-64 only
                     [(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))],
                     IIC_SSE_MOVDQ>;

  def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
                     "mov{d|q}\t{$src, $dst|$dst, $src}",
                     [(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))],
                     IIC_SSE_MOVDQ>, Sched<[WriteMove]>;

As shown in the test case and PR25554:
https://llvm.org/bugs/show_bug.cgi?id=25554

This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction.
This patch deletes one pair of these defs.
Sadly, this won't fix the original test case in the bug report. Something else is still broken.

Differential Revision: http://reviews.llvm.org/D14941

llvm-svn: 253988
2015-11-24 15:44:35 +00:00
Matt Arsenault ff05da806c AMDGPU: Split LDS vector loads
If properly aligned this could allow using ds_read_b64.

llvm-svn: 253975
2015-11-24 12:18:54 +00:00
Matt Arsenault 4d801cd357 AMDGPU: Split x8 and x16 vector loads instead of scalarize
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.

llvm-svn: 253974
2015-11-24 12:05:03 +00:00
Cong Hou 1938f2eb98 Let SelectionDAG start to use probability-based interface to add successors.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.


Differential revision: http://reviews.llvm.org/D14361

llvm-svn: 253965
2015-11-24 08:51:23 +00:00
Cong Hou bed60d35ed [X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW.
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:

%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>

and with this patch it will be converted to a X86ISD::AVG instruction.

The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.

Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.


Differential revision: http://reviews.llvm.org/D14761

llvm-svn: 253952
2015-11-24 05:44:19 +00:00
Sanjay Patel 8ca4a5b9e5 minimize test case but still show the bug
llvm-svn: 253940
2015-11-24 00:11:48 +00:00
Sanjay Patel 16fcf25eb9 added comment (using freshly updated update_llc_test_checks.py)
llvm-svn: 253935
2015-11-23 23:22:05 +00:00
Sanjay Patel d6e0cb01b1 [x86] add test to show suboptimal codegen (PR25554)
llvm-svn: 253934
2015-11-23 23:18:20 +00:00
Andy Ayers 9f7501896e findDeadCallerSavedReg needs to pay attention to calling convention
Caller saved regs differ between SysV and Win64. Use the tail call available set to scavenge from.

Refactor register info to create new helper to get at tail call GPRs. Added a new test case for windows. Fixed up a number of X64 tests since now RCX is preferred over RDX on SysV.

Differential Revision: http://reviews.llvm.org/D14878

llvm-svn: 253927
2015-11-23 22:17:44 +00:00
Dan Gohman 2f16f25391 [WebAssembly] Don't special-case call operand order.
With the '=' suffix now indicating which operands are output operands, it's
no longer as important to distinguish between a call's inputs and its outputs
using operand ordering, so we can go back to printing them in the normal order.

llvm-svn: 253925
2015-11-23 22:04:06 +00:00
Dan Gohman 700515fa92 [WebAssembly] Suffix output operands with '='.
This distinguishes input operands from output operands. This is something of
a syntactic experiment to see whether the mild amount of clutter this adds is
outweighed by the extra information it conveys to the reader.

llvm-svn: 253922
2015-11-23 21:55:57 +00:00