Commit Graph

19933 Commits

Author SHA1 Message Date
Petr Hosek c3a9e6db38 [AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18
When using -ffixed-x18, the x18 (or w18) register can safely be used
with the "global register variable" GCC extension, but the backend
fails to recognize it.

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D31793

llvm-svn: 299799
2017-04-07 20:41:58 +00:00
Simon Dardis f7e4388e3b Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
This reverts commit r299766. This change appears to have broken the MIPS
buildbots. Reverting while I investigate.

Revert "[mips] Remove usage of debug only variable (NFC)"

This reverts commit r299769. Follow up commit.

llvm-svn: 299788
2017-04-07 17:25:05 +00:00
Stanislav Mekhanoshin 478b81982f [AMDGPU] Unroll more to eliminate phis and conditions
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.

Add a small bonus for each of such "if" statements.

Differential Revision: https://reviews.llvm.org/D31693

llvm-svn: 299779
2017-04-07 16:26:28 +00:00
Dehao Chen 58fa724494 Use PMADDWD to expand reduction in a loop
Summary:
PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like:

for (int i = 0; i < count; i++)
  a += x[i] * y[i];

Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31679

llvm-svn: 299776
2017-04-07 15:41:52 +00:00
Igor Breger 2953788c36 [GlobalISel] implement narrowing for G_CONSTANT.
Summary: [GlobalISel] implement narrowing for G_CONSTANT.

Reviewers: bogner, zvi, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31744

llvm-svn: 299772
2017-04-07 14:41:59 +00:00
Petar Jovanovic bc54eb89ad [mips][msa] Fix generation of bm(n)zi and bins[lr]i instructions
We have two cases here, the first one being the following instruction
selection from the builtin function:
bm(n)zi builtin -> vselect node -> bins[lr]i machine instruction

In case of bm(n)zi having an immediate which has either its high or low bits
set, a bins[lr] instruction can be selected through the selectVSplatMask[LR]
function. The function counts the number of bits set, and that value is
being passed to the bins[lr]i instruction as its immediate, which in turn
copies immediate modulo the size of the element in bits plus 1 as per specs,
where we get the off-by-one-error.

The other case is:
bins[lr]i -> vselect node -> bsel.v

In this case, a bsel.v instruction gets selected with a mask having one bit
less set than required.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D30579

llvm-svn: 299768
2017-04-07 13:31:36 +00:00
Simon Dardis 6470ff0b24 [SelectionDAG] Enable target specific vector scalarization of calls and returns
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 299766
2017-04-07 13:03:52 +00:00
Jonas Paulsson cad72efee6 [SystemZ] Check for presence of vector support in SystemZISelLowering
A test case was found with llvm-stress that caused DAGCombiner to crash
when compiling for an older subtarget without vector support.

SystemZTargetLowering::combineTruncateExtract() should do nothing for older
subtargets.

This check was placed in canTreatAsByteVector(), which also helps in a few
other places.

Review: Ulrich Weigand
llvm-svn: 299763
2017-04-07 12:35:11 +00:00
Diana Picus fed80723c0 [ARM] GlobalISel: Test hard float properly
It turns out -float-abi=hard doesn't set the hard float calling
convention for libcalls. We need to use a hard float triple instead
(e.g. gnueabihf).

llvm-svn: 299761
2017-04-07 12:04:24 +00:00
Sam Kolton 6e79529db4 [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31705

llvm-svn: 299757
2017-04-07 10:53:12 +00:00
Diana Picus 3c608448e1 [ARM] GlobalISel: Support frem for 64-bit values
Legalize to a libcall.

llvm-svn: 299756
2017-04-07 10:50:02 +00:00
Diana Picus a5bab61a8d [ARM] GlobalISel: Support frem for 32-bit values
Legalize to a libcall.
On this occasion, also start allowing soft float subtargets. For the
moment G_FREM is the only legal floating point operation for them.

llvm-svn: 299753
2017-04-07 09:41:39 +00:00
Konstantin Zhuravlyov 4b3847e865 AMDGPU/GFX9: Fix shared and private aperture queries
Differential Revision: https://reviews.llvm.org/D31786

llvm-svn: 299727
2017-04-06 23:02:33 +00:00
Eli Friedman 5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Michael Kuperstein 6129887d21 [X86] Revert r299387 due to AVX legalization infinite loop.
llvm-svn: 299720
2017-04-06 22:33:25 +00:00
Matt Arsenault 21a438255d AMDGPU: Diagnose illegal SGPR to VGPR copies
This is possible in ways that are not compiler bugs,
so stop asserting on them.

This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.

llvm-svn: 299712
2017-04-06 21:09:53 +00:00
Matt Arsenault 5cf4271883 AMDGPU: Replace fp16SrcZerosHighBits with a whitelist
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.

llvm-svn: 299708
2017-04-06 20:58:30 +00:00
Huihui Zhang 98240e9643 [SelectionDAG] [ARM CodeGen] Fix chain information of LowerMUL
In LowerMUL, the chain information is not preserved for the new
created Load SDNode.

For example, if a Store alias with one of the operand of Mul.
The Load for that operand need to be scheduled before the Store.
The dependence is recorded in the chain of Store, in TokenFactor.
However, when lowering MUL, the SDNodes for the new Loads for
VMULL are not updated in the TokenFactor for the Store. Thus the
chain is not preserved for the lowered VMULL.

llvm-svn: 299701
2017-04-06 20:22:51 +00:00
Yi Kong 5e7059b702 Revert "[ARM] Add Kryo to available targets"
This reverts commit 942d6e6f58bf7e63810dd7cbcbce1fdfa5ebc6d4.

Build breakage.

llvm-svn: 299689
2017-04-06 19:16:14 +00:00
Nirav Dave 974f7c23ae [SDAG] Fix visitAND optimization to deal with vector extract case again.
Summary:
Fix case elided by rL298920.

Fixes PR32545.

Reviewers: eli.friedman, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31759

llvm-svn: 299688
2017-04-06 19:05:41 +00:00
Yi Kong 2b622b1fc1 [ARM] Add Kryo to available targets
Summary:
Host CPU detection now supports Kryo, so we need to recognize it in ARM
target.

Reviewers: mcrosier, t.p.northover, rengolin, echristo, srhines

Reviewed By: t.p.northover, echristo

Subscribers: aemerson

Differential Revision: https://reviews.llvm.org/D31775

llvm-svn: 299674
2017-04-06 18:10:08 +00:00
Stanislav Mekhanoshin ea57c38521 [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.

Differential Revision: https://reviews.llvm.org/D31731

llvm-svn: 299659
2017-04-06 16:48:30 +00:00
Sam Kolton 9fa169601f [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299654
2017-04-06 15:03:28 +00:00
Simon Pilgrim 77d3c770d3 [X86][MMX] Test showing failure to create MMX non-temporal store
llvm-svn: 299640
2017-04-06 10:32:30 +00:00
David Green 1b4b59a415 [ARM] Remove a dead ADD during the creation of TBBs
During the optimisation of jump tables in the constant island pass,
an extra ADD could be left over, now dead but not removed.

Differential Revision: https://reviews.llvm.org/D31389

llvm-svn: 299634
2017-04-06 08:32:47 +00:00
Ivan Krasin d4f70c70b9 Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671

llvm-svn: 299583
2017-04-05 19:58:12 +00:00
Krzysztof Parzyszek 2182b4b7b3 [Hexagon] Use -mattr to select HVX mode in a testcase, NFC
llvm-svn: 299582
2017-04-05 19:46:37 +00:00
Adam Nemet d5ffdd3605 [DAGCombine] Support FMF contract in fused multiple-and-sub too
This is a follow-on to r299096 which added support for fmadd.

Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.

llvm-svn: 299572
2017-04-05 17:58:48 +00:00
Renato Golin edfeb773fd [ARM] Try to re-enable MachineBranchProb.ll for ARM/AArch64
Commit r298799 changed code that made the XFAIL on MachineBranchProb.ll
irrelevant, but some configurations still failed. I can't reproduce it
locally, so I'm hoping that enabling this will tell me if some
configurations will really fail or if they were just too slow.

llvm-svn: 299558
2017-04-05 16:27:11 +00:00
Nirav Dave aa65a2beb8 [SystemZ] Prevent Merging Bitcast with non-normal loads
Fixes PR32505.

Reviewers: uweigand, jonpa

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31609

llvm-svn: 299552
2017-04-05 15:42:48 +00:00
Sanjay Patel b2f1621bb1 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Jonas Paulsson 38a2da92bc [DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type.
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.

Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.

This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.

Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422

llvm-svn: 299540
2017-04-05 13:45:37 +00:00
Sam Kolton 34e29784fb [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299536
2017-04-05 12:00:45 +00:00
Ahmed Bougacha ec8b1fb539 [X86] Relax assert in broadcast-of-subvector lowering.
Before r294774, there was a problem when lowering broadcasts to use
128-bit subvectors.

When we looked through a bitcast to find the broadcast input, we'd keep
using the original type, so you'd end up with things like:
  (v8f32 (broadcast
    (v4f32 (extract_subvector
      (v8i32 V),
      ...))
    ))

r294774 fixed it to always emit subvectors with the scalar type of the
original source.

It also introduced some asserts, to check that we use scalars with
the same size, and vectors with the same number of elements.

The scalar size equality is checked earlier when looking through bitcasts,
and is a useful assert.

However, the number of elements don't have to be identical: we're always
going to extract a 128-bit subvector, and we can have different size
inputs if we looked through a concat_vector to find a 256-bit source.

Relax the overzealous assert.

Replace it with a check of the original source vector being 256 or 512
bits.  If it's 128 bits, we can't extract_subvector from it.

Fixes PR32371.

llvm-svn: 299490
2017-04-05 00:14:39 +00:00
Ahmed Bougacha d3c03a5ddd [AArch64] Avoid partial register deps on insertelt of load into lane 0.
This improves upon r246462: that prevented FMOVs from being emitted
for the cross-class INSERT_SUBREGs by disabling the formation of
INSERT_SUBREGs of LOAD.  But the ld1.s that we started selecting
caused us to introduce partial dependencies on the vector register.

Avoid that by using SCALAR_TO_VECTOR: it's a first-class citizen that
is folded away by many patterns, including the scalar LDRS that we
want in this case.

Credit goes to Adam for finding the issue!

llvm-svn: 299482
2017-04-04 22:55:53 +00:00
Evgeniy Stepanov 12de7b2446 Change section flag character for SHF_LINK_ORDER to "o".
GAS uses "m" as a compatibility alias for "M" (SHF_MERGE).

"o" is free, except on ia64, where it already means SHF_LINK_ORDER.

llvm-svn: 299479
2017-04-04 22:35:08 +00:00
Petr Hosek 9eb0a1e09b [AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsia
This mode is just like -mcmodel=small except that it moves the
thread pointer from TPIDR_EL0 to TPIDR_EL1.

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D31624

llvm-svn: 299462
2017-04-04 19:51:53 +00:00
Matt Arsenault 3333968771 Verifier: Check some amdgpu calling convention restrictions
llvm-svn: 299457
2017-04-04 18:43:11 +00:00
Matt Arsenault 3e90f84806 AMDGPU: Remove legacy export intrinsic
llvm-svn: 299444
2017-04-04 16:34:39 +00:00
Matt Arsenault 236da200f1 AMDGPU: Remove legacy image intrinsics
llvm-svn: 299443
2017-04-04 16:34:35 +00:00
Michael Zuckerman 88fb171015 [X86][LLVM] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part one of two reviews, one for the clang and the other for LLVM. 
The patch deletes the back-end intrinsics and adds support for them in the auto upgrade.

Differential Revision: https://reviews.llvm.org/D31393

llvm-svn: 299432
2017-04-04 13:32:14 +00:00
Daniel Sanders bee5739a7c [tablegen][globalisel] Add support for nested instruction matching.
Summary:
Lift the restrictions that prevented the tree walking introduced in the
previous change and add support for patterns like:
  (G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3
Also adds support for G_SEXT and G_ZEXT to support these cases.

One particular aspect of this that I should draw attention to is that I've
tried to be overly conservative in determining the safety of matches that
involve non-adjacent instructions and multiple basic blocks. This is intended
to be used as a cheap initial check and we may add a more expensive check in
the future. The current rules are:
* Reject if any instruction may load/store (we'd need to check for intervening
  memory operations.
* Reject if any instruction has implicit operands.
* Reject if any instruction has unmodelled side-effects.
See isObviouslySafeToFold().

Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka

Reviewed By: ab

Subscribers: igorb, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30539

llvm-svn: 299430
2017-04-04 13:25:23 +00:00
Simon Dardis 0a47edb153 [mips] Deal with empty blocks in the mips hazard scheduler
This patch teaches the hazard scheduler how to handle empty blocks
when search for the next real instruction when dealing with forbidden
slots.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D31293

llvm-svn: 299427
2017-04-04 11:28:53 +00:00
Oren Ben Simhon 568fb197da [X86] Add 64 bit pattern matching for PSADBW
PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison.
The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT).

Differential Revision: https://reviews.llvm.org/D31577

llvm-svn: 299425
2017-04-04 10:23:18 +00:00
Sanjay Patel a4546efbc8 add/move codegen tests for and/or of setcc; NFC
llvm-svn: 299396
2017-04-03 22:45:46 +00:00
Matt Arsenault b600e138cc AMDGPU: Remove llvm.SI.vs.load.input
llvm-svn: 299391
2017-04-03 21:45:13 +00:00
Matt Arsenault c82768290d DAG: Fix missing legalization for any_extend_vector_inreg operands
llvm-svn: 299389
2017-04-03 21:28:13 +00:00
Simon Pilgrim af33757b5d [X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE
It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging.

This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values.

There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch.

Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this.

Differential Revision: https://reviews.llvm.org/D31373

llvm-svn: 299387
2017-04-03 21:06:51 +00:00
Amjad Aboud 0389f62879 x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.

This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.

Fixes Bug 26413

Patch by Philipp Oppermann.

Differential Revision: https://reviews.llvm.org/D30049

llvm-svn: 299383
2017-04-03 20:28:45 +00:00
Jun Bum Lim dee5565869 [CodeGenPrep] move aarch64-type-promotion to CGP
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.

Reviewers: jmolloy, mcrosier, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, aemerson, rengolin, mcrosier

Differential Revision: https://reviews.llvm.org/D28680

llvm-svn: 299379
2017-04-03 19:20:07 +00:00
Matt Arsenault 754dd3eaef AMDGPU: Remove legacy bfe intrinsics
llvm-svn: 299372
2017-04-03 18:08:08 +00:00
Zvi Rackover d76a4d0ac6 Revert "[DAGCombine] A shuffle of a splat is always the splat itself"
This reverts commit r299047 which is incorrect because the
simplification may result in incorrect propogation of undefs to users of
the folded shuffle.

Thanks to Andrea Di Biagio for pointing this out.

llvm-svn: 299368
2017-04-03 17:41:19 +00:00
Simon Pilgrim 0e2f8cd875 [X86][MMX] Improve support for folding fptosi from XMM to MMX
llvm-svn: 299338
2017-04-02 17:45:41 +00:00
Simon Pilgrim ba28263b03 [X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64
llvm-svn: 299336
2017-04-02 16:20:34 +00:00
Simon Pilgrim e56a2d7b4c [X86][MMX] Added support for subvector extraction to MMX register
llvm-svn: 299335
2017-04-02 15:52:28 +00:00
Simon Pilgrim 7fc08a8117 Regenerate test with codegen. NFCI.
llvm-svn: 299333
2017-04-02 14:21:14 +00:00
Simon Pilgrim 841ecebd7c Regenerate test with codegen. NFCI.
llvm-svn: 299332
2017-04-02 13:59:37 +00:00
Simon Pilgrim 637182f262 Regenerate test. NFCI.
llvm-svn: 299331
2017-04-02 13:50:44 +00:00
Simon Pilgrim dddce31eb4 [X86][MMX] Add generic fptosi 4f32-4i32 test
llvm-svn: 299328
2017-04-02 13:10:20 +00:00
Sanjay Patel 665021e7ee [DAGCombiner] enable vector transforms for any/all {sign} bits set/clear
The code already allowed vector types in via "isInteger" (which might want
a more specific name), so use splat-friendly constant predicates to match
those types.

llvm-svn: 299304
2017-04-01 15:05:54 +00:00
Sanjay Patel fe9340c168 [PowerPC, x86] add vector tests for any/all {sign} bits set/clear; NFC
llvm-svn: 299303
2017-04-01 14:32:18 +00:00
Craig Topper 73250168e7 [DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector.
This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros.

Fixes PR32484.

llvm-svn: 299291
2017-04-01 04:26:20 +00:00
Quentin Colombet 49d70d0529 Revert "Feature generic option to setup start/stop-after/before"
This reverts commit r299282.

Didn't intend to commit this :(

llvm-svn: 299288
2017-04-01 01:26:24 +00:00
Quentin Colombet fc8f048c13 Revert "Localizer fun"
This reverts commit r299283.

Didn't intend to commit this :(

llvm-svn: 299287
2017-04-01 01:26:21 +00:00
Quentin Colombet 7f64318938 [RegBankSelect] Support REG_SEQUENCE for generic mapping
REG_SEQUENCE falls into the same category as COPY for operands mapping:
- They don't have MCInstrDesc with register constraints
- The input variable could use whatever register classes
- It is possible to have register class already assigned to the operands

In particular, given REG_SEQUENCE are always target specific because of
the subreg indices. Those indices must apply to the register class of
the definition of the REG_SEQUENCE and therefore, the target must set a
register class to that definition. As a result, the generic code can
always use that register class to derive a valid mapping for a
REG_SEQUENCE.

llvm-svn: 299285
2017-04-01 01:26:14 +00:00
Quentin Colombet 3c40b366c5 Localizer fun
WIP

llvm-svn: 299283
2017-04-01 01:21:28 +00:00
Quentin Colombet ffe3053a66 Feature generic option to setup start/stop-after/before
This patch refactors the code used in llc such that all the users of the
addPassesToEmitFile API have access to a homogeneous way of handling
start/stop-after/before options right out of the box.

Previously each user would have needed to duplicate this logic and set
up its own options.

NFC

llvm-svn: 299282
2017-04-01 01:21:24 +00:00
Krzysztof Parzyszek b326411fdc [Hexagon] Fix typo in HexagonEarlyIfCConv.cpp
Found by PVS-Studio. Fixes llvm.org/PR32480.

llvm-svn: 299258
2017-03-31 20:36:00 +00:00
Sanjay Patel 34da36e74f [DAGCombiner] add fold for 'All sign bits set?'
(and (setlt X,  0), (setlt Y,  0)) --> (setlt (and X, Y),  0)

We have 7 similar folds, but this one got away. The fact that the
x86 test with a branch didn't change is probably a separate bug. We
may also be missing this and the related folds in instcombine.

llvm-svn: 299252
2017-03-31 20:28:06 +00:00
Matt Arsenault 8edfaee7be AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00
Jan Vesely 3c99441ef4 AMDGPU/R600: Fix amdgpu alias analysis pass.
R600 uses higher AS number to access kernel parameters

Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520

llvm-svn: 299245
2017-03-31 19:26:23 +00:00
Sanjay Patel a97d36fdda [PowerPC] add tests for setcc+setcc+logic; NFC
These are the same tests added for x86 with r299238,
but PPC doesn't specify all branches as cheap, so we 
see different patterns in tests with branches.

llvm-svn: 299244
2017-03-31 18:51:03 +00:00
Balaram Makam 2aba753e84 [AArch64] Add new subtarget feature to fold LSL into address mode.
Summary:
This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL.

Reviewers: mcrosier, rengolin, t.p.northover

Subscribers: junbuml, gberry, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31113

llvm-svn: 299240
2017-03-31 18:16:53 +00:00
Sanjay Patel 303abb803a [x86] add/consolidate tests for setcc+setcc+and/or; NFC
llvm-svn: 299238
2017-03-31 17:55:07 +00:00
Craig Topper 9601168670 [AVX-512] Update lowering for gather/scatter prefetch intrinsics to match the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers.
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.

Fixes PR32411.

llvm-svn: 299234
2017-03-31 17:24:29 +00:00
Petar Jovanovic 9bff3b7818 [mips][msa] Prevent output operand from commuting for dpadd_[su].df ins
Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target,
restricting commutativity to second and third operand only for
dpaadd_[su].df instructions therein.

Prior to this change, there were cases where the vector that is to be added
to the dot product of the other two could take a position other than the
first one in the instruction, generating false output in the destination
vector.

Such behavior has been noticed in the two functions generating v2i64 output
values so far. Other ones may exhibit such behavior as well, just not for
the vector operands which are present in the test at the moment.

Tests altered so that the function's first operand is a constant splat so
that it can be loaded with a ldi instruction, since that is the case in
which the erroneous instruction operand placement has occurred. We check
that the register which is present in the ldi instruction is placed as the
first operand in the corresponding dpadd instruction.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D30827

llvm-svn: 299223
2017-03-31 14:31:55 +00:00
Simon Pilgrim 1cdbfe44b1 [DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT
Followup to D31311

llvm-svn: 299221
2017-03-31 14:21:50 +00:00
Jonas Paulsson c7bb22e75f [SystemZ] Make sure of correct regclasses in insertSelect()
Since LOCR only accepts GR32 virtual registers, its operands must be copied
into this regclass in insertSelect(), when an LOCR is built. Otherwise, the
case where the source operand was GRX32 will produce invalid IR.

Review: Ulrich Weigand
llvm-svn: 299220
2017-03-31 14:06:59 +00:00
Simon Pilgrim 3c81c34d8d [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

llvm-svn: 299219
2017-03-31 13:54:09 +00:00
Jonas Paulsson 56bb0857e9 [SystemZ] Skip DAGCombining of vector node for older subtargets.
Even on older subtargets that lack vector support, there may be vector values
with just one element in the input program. These are converted during DAG
legalization to scalar values.

The pre-legalize SystemZ DAGCombiner methods should in this circumstance not
touch these nodes. This patch adds a check for this in
SystemZTargetLowering::combineEXTRACT_VECTOR_ELT().

Review: Ulrich Weigand
llvm-svn: 299213
2017-03-31 13:22:59 +00:00
Sam Kolton 27e0f8bc72 [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.

llvm-svn: 299202
2017-03-31 11:42:43 +00:00
Eric Christopher 9fd267c221 Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline.
This reverts commit r298955.

llvm-svn: 299153
2017-03-31 02:16:54 +00:00
Eric Christopher bf8f759305 Add testcase for r299124.
Patch by Tim Shen!

llvm-svn: 299125
2017-03-30 22:35:10 +00:00
Matt Arsenault 79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Stanislav Mekhanoshin 89653dfd2a [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108
2017-03-30 20:16:02 +00:00
Adam Nemet edaec6de73 [DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.

The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.

The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.

Differential Revision: https://reviews.llvm.org/D31169

llvm-svn: 299096
2017-03-30 18:53:04 +00:00
Ahmed Bougacha 6dd6082472 [CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.

Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.

SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.

Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).

Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed.  For now, use
'isEnabled' as a crude replacement.

This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'.  Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.

This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.

Differential Revision: https://reviews.llvm.org/D31405

llvm-svn: 299093
2017-03-30 17:49:58 +00:00
Zvi Rackover 7569436f81 [DAGCombine] A shuffle of a splat is always the splat itself
Summary:
Add a simplification:
shuffle (splat-shuffle), undef, M --> splat-shuffle

Fixes pr32449

Patch by Sanjay Patel

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31426

llvm-svn: 299047
2017-03-30 01:42:57 +00:00
Sanjay Patel b8a728f993 [CodeGen] clean up and add tests for scalar and-of-setcc; NFC
https://bugs.llvm.org/show_bug.cgi?id=32401

llvm-svn: 299034
2017-03-29 21:58:52 +00:00
Simon Pilgrim 2845189bd1 [X86][AVX2] Prevent unary interleaving patterns from calling lowerVectorShuffleAsSplitOrBlend (PR32453)
llvm-svn: 298993
2017-03-29 13:00:00 +00:00
Simon Pilgrim be22cff6fd [X86][MMX] Added generic sitofp test to compare against existing cvtdq2ps test.
llvm-svn: 298989
2017-03-29 10:47:18 +00:00
Craig Topper d9f51350b8 [AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to GR32 is enough.
llvm-svn: 298985
2017-03-29 07:31:56 +00:00
Craig Topper d284606327 [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.

llvm-svn: 298984
2017-03-29 06:55:28 +00:00
Adam Nemet 92a5cf4366 [SDAG] Remove -enable-fmf-dag
This is no longer needed as spotted by Sanjay in
https://reviews.llvm.org/D31165.

llvm-svn: 298963
2017-03-28 23:46:14 +00:00
Craig Topper a795be60c1 [AVX-512] Add test case that was supposed to go with r298957.
llvm-svn: 298959
2017-03-28 23:29:35 +00:00
Guozhi Wei f8d40181c9 [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.

This patch fixed PR32442.

Differential Revision: https://reviews.llvm.org/D31407

llvm-svn: 298955
2017-03-28 22:55:01 +00:00
Eric Christopher 69b191c628 Add a similar test for tailcall optimization as in r270287 for aarch64.
llvm-svn: 298952
2017-03-28 22:37:43 +00:00
Stanislav Mekhanoshin baf31ac7c8 [AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.

Differential Revision: https://reviews.llvm.org/D31412

llvm-svn: 298948
2017-03-28 22:13:51 +00:00
Simon Pilgrim c7c5aa47cf [X86][MMX] Match MMX fp_to_sint conversions from XMM registers
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).

This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.

Differential Revision: https://reviews.llvm.org/D30868

llvm-svn: 298943
2017-03-28 21:32:11 +00:00
Stanislav Mekhanoshin 9053f22eeb [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935
2017-03-28 18:23:24 +00:00
Sanjay Patel f01a1dad7f [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Follow-up to:
https://reviews.llvm.org/rL298775

llvm-svn: 298933
2017-03-28 17:23:49 +00:00
Nirav Dave 472b5efc8b [SDAG] Deal with deleted node in PromoteIntShiftOp
Deal with case that initial node is deleted during dag-combine leading
to an assertional failure in promoteIntShiftOp.

Fixes PR32420.

Reviewers: spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31403

llvm-svn: 298931
2017-03-28 17:09:49 +00:00
Zvi Rackover a4c354951b Add reproducer test for pr32449. NFC.
llvm-svn: 298930
2017-03-28 16:45:23 +00:00
Simon Pilgrim 3e2aa7f40e [X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW
llvm-svn: 298929
2017-03-28 16:40:38 +00:00
Craig Topper 058f2f6d72 [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

Differential Revision: https://reviews.llvm.org/D30968

llvm-svn: 298928
2017-03-28 16:35:29 +00:00
Sanjay Patel 5d39a98612 [x86] add separate check prefix for SSE; NFC
We want to check each test on each target, so we need another prefix
when SSE and AVX diverge (as they will if we handle 32-byte and higher). 

llvm-svn: 298926
2017-03-28 15:55:50 +00:00
Nirav Dave 5b414ebe63 [SDAG] Avoid deleted SDNodes PromoteIntBinOp
Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from
being used.

Fixes PR32340 and PR32345.

Reviewers: hfinkel, dbabokin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31148

llvm-svn: 298923
2017-03-28 15:41:12 +00:00
Nirav Dave 9b5563c52c [SDAG] Fix Stale SDNode usage in visitAND
Reorder CombineTo Calls to prevent potential use of deleted node.
Fixes PR32372.

Reviewers: jnspaulsson, RKSimon, uweigand, jonpa

Reviewed By: jonpa

Subscribers: jonpa, llvm-commits

Differential Revision: https://reviews.llvm.org/D31346

llvm-svn: 298920
2017-03-28 14:11:20 +00:00
Sanjay Patel e4f11334fa [x86] add AVX2 run to show 256-bit opportunity; NFC
llvm-svn: 298918
2017-03-28 13:46:50 +00:00
Igor Breger f580fce2c3 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    G_LOAD/G_STORE, add alternative RegisterBank mapping.
    For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.

    Reviewers: zvi, rovka, qcolombet, ab

    Reviewed By: zvi

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30979

llvm-svn: 298907
2017-03-28 09:35:06 +00:00
Renato Golin be2f7d9d61 [ARM] Mark falky test unsupported until we find the cause
llvm-svn: 298887
2017-03-27 22:38:43 +00:00
Javed Absar 3d59437093 Improve machine schedulers for in-order processors
This patch enables schedulers to specify instructions that 
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.

Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744

llvm-svn: 298885
2017-03-27 20:46:37 +00:00
Ahmed Bougacha f75782f9dc [GlobalISel][AArch64] Fold FI into LDR/STR ui addressing mode.
A majority of loads and stores at O0 access an alloca.

It's trivial to fold the G_FRAME_INDEX into the instruction; do it.

llvm-svn: 298864
2017-03-27 17:31:56 +00:00
Ahmed Bougacha 8a654085d0 [GlobalISel][AArch64] Fold G_GEP into LDR/STR ui addressing mode.
We're not to the point of supporting the load/store patterns yet
(because they extensively use PatFrags).

But in the meantime, we can implement some of the simplest addressing
modes.

llvm-svn: 298863
2017-03-27 17:31:52 +00:00
Ahmed Bougacha 85a66a6d9f [GlobalISel][AArch64] Select store of zero to WZR/XZR.
These occur very frequently, and are quite trivial to catch.

llvm-svn: 298862
2017-03-27 17:31:48 +00:00
Ahmed Bougacha 641cb203b6 [GlobalISel][AArch64] Select CBZ.
CBZ/CBNZ represent a substantial portion of all conditional branches.
Look through G_ICMP to select them.

We can't use tablegen yet because the existing patterns match an
AArch64ISD node.

llvm-svn: 298856
2017-03-27 16:35:31 +00:00
Ahmed Bougacha c1cbcee170 [GlobalISel][AArch64] Use proper constant types in test. NFC.
llvm-svn: 298854
2017-03-27 16:35:23 +00:00
Chad Rosier 862a41270f [AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as not having side effects.
Among other things, this allows Machine LICM to hoist a costly 'mrs'
instruction from within a loop.

Differential Revision: http://reviews.llvm.org/D31151

llvm-svn: 298851
2017-03-27 15:52:38 +00:00
Gadi Haber 89d5f9391a [X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
 The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
 In this case using vector unpack instructions is more efficient.

Reviewers:
zvi  
delena  
igorb  
craig.topper  
guyblank  
eladcohen  
m_zuckerman  
aymanmus  
RKSimon 

llvm-svn: 298840
2017-03-27 12:13:37 +00:00
Simon Pilgrim 92925ea701 [X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions
llvm-svn: 298806
2017-03-26 13:17:55 +00:00
Simon Pilgrim 049d9c921f [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk
Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481)

The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X

Differential Revision: https://reviews.llvm.org/D31200

llvm-svn: 298805
2017-03-26 12:52:28 +00:00
Simon Pilgrim 544f750de6 Regenerate test
llvm-svn: 298803
2017-03-26 10:33:03 +00:00
Simon Pilgrim a2b81dc411 Regenerate test
The CHECK-DAG aren't necessary and get in the way of automated checks

llvm-svn: 298802
2017-03-26 10:31:37 +00:00
Simon Pilgrim 1d8235a022 Regenerate tests to remove duplicated checks
llvm-svn: 298801
2017-03-26 10:28:39 +00:00
Igor Breger 531a203a06 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    Support G_FRAME_INDEX instruction selection.

    Reviewers: zvi, rovka, ab, qcolombet

    Reviewed By: ab

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30980

llvm-svn: 298800
2017-03-26 08:11:12 +00:00
Simon Pilgrim c0720a4052 [X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1))
Part 3 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298782
2017-03-25 20:43:01 +00:00
Simon Pilgrim 6397963c81 [X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI
Part 2 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298780
2017-03-25 19:58:36 +00:00
Sanjay Patel 9ebb68843e [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, 
we can replace memcmp calls with inline code that is both smaller and faster.

Differential Revision: https://reviews.llvm.org/D31290

llvm-svn: 298775
2017-03-25 16:05:33 +00:00
Simon Pilgrim c3e5c3c5bc [X86][SSE] Add extra computeNumSignBits test case for D31311.
llvm-svn: 298774
2017-03-25 15:43:36 +00:00
Yaxun Liu 14834c3e3d [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758
2017-03-25 02:05:44 +00:00
Eli Friedman 95ddd18703 [ARM] Fix mixup between Lo and Hi in SMLALBB formation.
llvm-svn: 298752
2017-03-25 00:13:24 +00:00
Sanjay Patel 650599d84d [x86] add 32-bit RUN for better memcmp coverage; NFC
llvm-svn: 298744
2017-03-24 22:09:48 +00:00
Matt Arsenault 0607a4427b AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.

llvm-svn: 298737
2017-03-24 20:57:10 +00:00
Matt Arsenault b5d23271e2 AMDGPU: Implement f16 fround
llvm-svn: 298730
2017-03-24 20:04:18 +00:00
Matt Arsenault b8f8dbc227 AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729
2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin 70603dcef2 [AMDGPU] Fold V_CNDMASK with identical source operands
Such instructions sometimes appear after lowering and folding.

Differential Revision: https://reviews.llvm.org/D31318

llvm-svn: 298723
2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov 4986d9fb45 [AMDGPU] Rename Kind to ValueKind in metadata to be consistent
llvm-svn: 298722
2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin a27b2cac03 [AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline
Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

llvm-svn: 298721
2017-03-24 18:01:14 +00:00
Simon Pilgrim 31c04590e6 [X86][SSE] Add ashr + mask test cases.
Test cases showing cases where we're missing an opportunity to lshr a value with an extended sign to avoid loading a mask

llvm-svn: 298716
2017-03-24 17:25:47 +00:00
Dehao Chen b197d5b0a0 Fix trellis layout to avoid mis-identify triangle.
Summary:
For the following CFG:

A->B
B->C
A->C

If there is another edge B->D, then ABC should not be considered as triangle.

Reviewers: davidxl, iteratee

Reviewed By: iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D31310

llvm-svn: 298661
2017-03-23 23:28:09 +00:00
Krzysztof Parzyszek 10fbac009d [Hexagon] Avoid infinite loops in HexagonLoopIdiomRecognition
- Avoid explosive growth of the simplification queue by not queuing
  expressions that are alredy in it.
- Add an iteration counter and abort after a sufficiently large number
  of iterations (assuming that it's a symptom of an infinite loop).

llvm-svn: 298655
2017-03-23 23:01:22 +00:00
Nirav Dave 9ebefeb9b1 [X86] Fix Stale SDNode use in X86ISelDAGtoDAG
Summary: Fixes pr32329.

Reviewers: spatel, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31286

llvm-svn: 298633
2017-03-23 18:25:17 +00:00
Pirama Arumuga Nainar bc26482717 [ARM] Fix computeKnownBits for ARMISD::CMOV
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead.  This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.

This patch fixes the issue and adds a test case.

Reviewers: kristof.beyls, jmolloy

Subscribers: llvm-commits, aemerson, srhines, rengolin

Differential Revision: https://reviews.llvm.org/D31265

llvm-svn: 298624
2017-03-23 16:47:47 +00:00
Simon Pilgrim 1c048ab6ba [X86][SSE] Extract elements from narrower shuffle masks.
Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle.

llvm-svn: 298616
2017-03-23 16:09:34 +00:00
Tim Shen ce26a45f7c [PPC] Add generated tests for all atomic operations
Summary: Add tests for all atomic operations for powerpc64le, so that all changes can be easily examined.

Reviewers: kbarton, hfinkel, echristo

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D31285

llvm-svn: 298614
2017-03-23 16:02:47 +00:00
Sanjay Patel 8db22f9032 [x86] add memcmp tests, remove run
Add tests for vector lengths that could be handled without a libcall.

There's an explicit test for 'nobuiltin', so there's not much value
in a separate run that checks that same behavior over and over again.

llvm-svn: 298611
2017-03-23 15:38:22 +00:00
Igor Breger a8ba572dcf [GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE

Reviewers: zvi, rovka, ab, qcolombet

Reviewed By: rovka

Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

Differential Revision: https://reviews.llvm.org/D30973

llvm-svn: 298609
2017-03-23 15:25:57 +00:00
Nirav Dave e9ca32ae52 [SDAG] Fix zeroExtend assertion error
Move CombineTo preventing deleted node from being returned in
visitZERO_EXTEND.

Fixes PR32284.

Reviewers: RKSimon, bogner

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31254

llvm-svn: 298604
2017-03-23 15:01:50 +00:00
Simon Pilgrim d341290b5c [X86][SSE] Add computeNumSignBits test for sitofp of (extended) i64 extracted element
llvm-svn: 298592
2017-03-23 13:18:09 +00:00
Michael Zuckerman 85436ece89 [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).

Differential Revision: https://reviews.llvm.org/D30654

llvm-svn: 298586
2017-03-23 09:57:01 +00:00