Commit Graph

17365 Commits

Author SHA1 Message Date
Akira Hatanaka b8d2873d93 [AArch64][Inline-Asm] Return the 32-bit floating point register class
when constraint "w" is used on a 32-bit operand.

This enables compiling the following code, which used to error out in
the backend:

void foo1(int a) {
  asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}

Fixes PR28633.

llvm-svn: 276344
2016-07-21 21:39:05 +00:00
Anna Thomas c858faa244 Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276316.

llvm-svn: 276320
2016-07-21 19:06:28 +00:00
Anna Thomas 29b24dfe44 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.

With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant memory in managed languages.

Reviewers: tstellarAMD, reames, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22519

llvm-svn: 276316
2016-07-21 18:41:44 +00:00
Quentin Colombet 2b59eab79f [IRTranslator] Add G_SUB opcode.
This commit adds a generic SUB opcode to global-isel.

llvm-svn: 276308
2016-07-21 17:26:50 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Quentin Colombet 7bcc921dd8 [IRTranslator] Add G_AND opcode.
This commit adds a generic AND opcode to global-isel.

llvm-svn: 276297
2016-07-21 15:50:42 +00:00
Geoff Berry 4ff2e36d32 [AArch64] Load/store opt: Don't count transient instructions towards search limits.
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22582

llvm-svn: 276293
2016-07-21 15:20:25 +00:00
Simon Pilgrim 88e0940d3b [X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.

But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.

This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).

Fix for PR27265.

Differential Revision: https://reviews.llvm.org/D22509

llvm-svn: 276289
2016-07-21 14:54:17 +00:00
Simon Pilgrim 4caefdf834 Fixed line endings
llvm-svn: 276287
2016-07-21 14:36:41 +00:00
Simon Pilgrim c8e20b1150 [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).

Differential Revision: https://reviews.llvm.org/D22460

llvm-svn: 276281
2016-07-21 14:10:54 +00:00
Marina Yatsina c1fa163392 ExecutionDepsFix - Fix bug in clearance calculation
The clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit.

Differential Revision: http://reviews.llvm.org/D22580

llvm-svn: 276266
2016-07-21 12:37:07 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Matthias Braun d9fdad72ae IPRA: Fix RegMask calculation for alias registers
This patch fixes a very subtle bug in regmask calculation. Thanks to zan
jyu Wong <zyfwong@gmail.com> for bringing this to notice.
For example if CL is only clobbered than CH should not be marked
clobbered but CX, RCX and ECX should be mark clobbered. Previously for
each modified register all of its aliases are marked clobbered by
markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because
when CL is clobbered then MRI::isPhysRegModified() will return true for
CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we
mark CH also clobbered as CH is aliased to CX,ECX,RCX so
markRegClobbred() is not required because isPhysRegModified already take
cares of proper aliasing register. A very simple test case has been
added to verify this change.
Please find relevant bug report here :
http://llvm.org/PR28567

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: https://reviews.llvm.org/D22400

llvm-svn: 276235
2016-07-21 03:50:39 +00:00
Justin Lebar cd564c6b46 [NVPTX] Enable the load-store vectorizer on nvptx.
Reviewers: tra

Subscribers: jholewinski, arsenm, asbirlea

Differential Revision: https://reviews.llvm.org/D22592

llvm-svn: 276196
2016-07-20 22:11:36 +00:00
Artem Belevich 7e9c9a6582 [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.
After r276153 the pass applies to both kernels and regular functions.

Differential Revision: https://reviews.llvm.org/D22583

llvm-svn: 276189
2016-07-20 21:44:07 +00:00
Ahmed Bougacha a0cdd79070 [AArch64][FastISel] Select -O0 legal cmpxchg.
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.

extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.

llvm-svn: 276183
2016-07-20 21:12:32 +00:00
Ahmed Bougacha b0674d1143 [AArch64][FastISel] Select atomic stores into STLR.
llvm-svn: 276182
2016-07-20 21:12:27 +00:00
Tim Northover 62ae568bbb GlobalISel: implement low-level type with just size & vector lanes.
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).

llvm-svn: 276158
2016-07-20 19:09:30 +00:00
Artem Belevich 74158b5061 [NVPTX] deal with all aggregate return types.
Fixes a crash in llvm_unreachable when a function has array return type.

Differential Revision: https://reviews.llvm.org/D22524

llvm-svn: 276154
2016-07-20 18:39:52 +00:00
Artem Belevich b2e76a5e7a [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.

Differential Review: https://reviews.llvm.org/D21421

llvm-svn: 276153
2016-07-20 18:39:47 +00:00
Matt Arsenault f14db7a933 AMDGPU: Add missing test coverage for control flow breaks
None of the current lit tests hit si_break handling.

llvm-svn: 276129
2016-07-20 15:20:35 +00:00
Yaxun Liu 4b1d9f7f18 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

llvm-svn: 276119
2016-07-20 14:38:06 +00:00
Diana Picus f345d40ae2 [ARM] Skip inline asm memory operands in DAGToDAGISel
Retry r275776 (no changes, we suspect the issue was with another commit).

The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.

The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.

Fixes PR26038

Differential Revision: https://reviews.llvm.org/D22103

llvm-svn: 276101
2016-07-20 09:48:24 +00:00
David Majnemer 5d26127752 Revert "Disable this-return argument forwarding on ARM/AArch64"
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.

This reverts commit r275677.

llvm-svn: 276081
2016-07-20 04:13:01 +00:00
Matthias Braun 5b9722d6c7 Revert "RegScavenging: Add scavengeRegisterBackwards()"
Reverting this commit for now as it seems to be causing failures on
test-suite tests on the clang-ppc64le-linux-lnt bot.

This reverts commit r276044.

llvm-svn: 276068
2016-07-20 00:21:32 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Matthias Braun 84fd4bee6c RegScavenging: Add scavengeRegisterBackwards()
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 276044
2016-07-19 22:37:09 +00:00
Evandro Menezes 238fa76574 [AArch64] Properly validate the reciprocal estimation.
Add check for legal data types when expanding into a Newton series.

Differential Revision: https://reviews.llvm.org/D22267

llvm-svn: 276041
2016-07-19 22:31:11 +00:00
Ahmed Bougacha 5a59b24bdd [GlobalISel] Mark newly-created gvregs as having a bank.
Also verify that we never try to set the size of a vreg associated
to a register class.

Report an error when we encounter that in MIR. Fix a testcase that
hit that error and had a size for no reason.

llvm-svn: 276012
2016-07-19 19:48:36 +00:00
Simon Pilgrim 5366d0e0bc [X86][AVX512] Added AVX512 subvector broadcast tests
llvm-svn: 275994
2016-07-19 17:04:28 +00:00
Simon Pilgrim f2d02cb0f6 [X86][AVX] Fixed typo in test names
llvm-svn: 275992
2016-07-19 16:52:05 +00:00
Simon Pilgrim 0ea8d275cc [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

llvm-svn: 275981
2016-07-19 15:07:43 +00:00
Sam Parker 6ca4bbb00d [ARM] Refactor Thumb2 Mul and Mla instr descs
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.

Differential Revision: https://reviews.llvm.org/D21929

llvm-svn: 275979
2016-07-19 14:44:05 +00:00
Simon Pilgrim b87a21f1c3 [AARCH64] Fix linu triple typo
As promised in D22191

llvm-svn: 275976
2016-07-19 14:12:45 +00:00
Simon Pilgrim fc4d4b251d [AARCH64] Enable AARCH64 lit tests on windows dev machines
As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows

This will hopefully help stop cases where windows developers break the aarch64 target

Differential Revision: https://reviews.llvm.org/D22191

llvm-svn: 275973
2016-07-19 13:35:11 +00:00
Daniel Sanders 6a73883c48 [mips] Correct label prefixes for N32 and N64.
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).

This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22412

llvm-svn: 275967
2016-07-19 10:49:03 +00:00
Elena Demikhovsky 2c0780b8e5 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Craig Topper d6ca1dc45e [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.
llvm-svn: 275942
2016-07-19 02:00:38 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault 50b76399ed AMDGPU: Fix test name and broken CHECK-LABEL
llvm-svn: 275928
2016-07-18 23:09:51 +00:00
Artem Belevich 9f97dcb018 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

llvm-svn: 275920
2016-07-18 21:58:48 +00:00
Artem Belevich 052b1ed2fd [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

llvm-svn: 275893
2016-07-18 19:54:56 +00:00
Vitaly Buka c93e10fcbb Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275776.

llvm-svn: 275890
2016-07-18 19:44:01 +00:00
Vitaly Buka fa474e3eb9 Revert "[ARM] Update test to use CHECK-LABEL. NFCI."
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275777.

llvm-svn: 275889
2016-07-18 19:43:58 +00:00
Simon Pilgrim 069c732f82 [X86][SSE] Regenerate extraction from promotion test
Added tests for SSE2 as well as SSE41

llvm-svn: 275878
2016-07-18 18:53:15 +00:00
Simon Pilgrim a68b8df3a7 [X86][SSE] Regenerate extraction+store memop tests
Added tests for SSE2 as well as SSE41+AVX

llvm-svn: 275876
2016-07-18 18:44:01 +00:00
Simon Pilgrim b21b47ba61 [X86][SSE] Regenerate truncate+extension memop tests
Added tests for SSE2 as well as SSE41

llvm-svn: 275875
2016-07-18 18:42:33 +00:00
Simon Pilgrim 600baaed89 Regenerate test
llvm-svn: 275872
2016-07-18 18:38:51 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault efb24540b1 AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

llvm-svn: 275869
2016-07-18 18:34:53 +00:00
Tim Northover 918f05063c CodeGenPrep: use correct function to determine Global's alignment.
Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.

llvm-svn: 275866
2016-07-18 18:28:52 +00:00
Simon Pilgrim c941f6b329 [X86][AVX] Add target shuffle decode support for VBROADCAST
Currently we only decode broadcasts from a vector of the same size.

llvm-svn: 275823
2016-07-18 17:32:59 +00:00
Krzysztof Parzyszek 5948ea78b9 [Hexagon] Handle returning small structures by value
This is compliant with the official ABI, but allows experimentation with
calling conventions.

llvm-svn: 275822
2016-07-18 17:30:41 +00:00
Chih-Hung Hsieh 4d9f2c154d [X86] Accept SELECT op code for x86-64 fp128 type
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.

Differential Revision: http://reviews.llvm.org/D21758

llvm-svn: 275818
2016-07-18 17:20:09 +00:00
Simon Pilgrim 4ac7420618 [X86][AVX2] Added tests that demonstrate duplicate broadcasts
We don't yet decode broadcasts as a target shuffle

llvm-svn: 275808
2016-07-18 16:17:34 +00:00
Krzysztof Parzyszek 786333ffcc [Hexagon] Enable .cur formation in MISched for Hexagon V60
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.

Patch by Ikhlas Ajbar.

llvm-svn: 275804
2016-07-18 16:05:27 +00:00
Nemanja Ivanovic d3c284f645 [PowerPC] Remove redundant direct moves when extracting integers and converting to FP
This patch corresponds to review:
https://reviews.llvm.org/D21354

We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.

llvm-svn: 275796
2016-07-18 15:30:00 +00:00
Krzysztof Parzyszek 393b37937b [Hexagon] Use timing class info as tie-breaker in machine scheduler
Patch by Sirish Pande.

llvm-svn: 275794
2016-07-18 15:17:10 +00:00
Krzysztof Parzyszek 3467e9d0a9 [Hexagon] HexagonMachineScheduler should account for resources
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.

This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.

Patch by Brendon Cahoon.

llvm-svn: 275793
2016-07-18 14:52:13 +00:00
Krzysztof Parzyszek 748d3efec6 [Hexagon] Fix zero latency instructions with multiple predecessors
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.

Patch by Brendon Cahoon.

llvm-svn: 275790
2016-07-18 14:23:10 +00:00
Simon Dardis d32a2d30cb [inlineasm] Propagate operand constraints to the backend
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:

offset + frame index

when the frame is resolved.

By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.

For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D21615

llvm-svn: 275786
2016-07-18 13:17:31 +00:00
Nicolai Haehnle bef1ceb815 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

llvm-svn: 275779
2016-07-18 09:02:47 +00:00
Diana Picus 6731f13458 [ARM] Update test to use CHECK-LABEL. NFCI.
llvm-svn: 275777
2016-07-18 07:48:42 +00:00
Diana Picus 73ed44d328 [ARM] Skip inline asm memory operands in DAGToDAGISel
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.

The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.

Fixes PR26038

Differential Revision: https://reviews.llvm.org/D22103

llvm-svn: 275776
2016-07-18 07:35:14 +00:00
Craig Topper a3c55f5915 [AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables.
llvm-svn: 275775
2016-07-18 06:49:32 +00:00
Craig Topper 83613bb436 [X86] Fix test checks to include leading 'v' on avx mnemonic names.
llvm-svn: 275774
2016-07-18 06:49:29 +00:00
Diana Picus 774d157a5d [ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl
At higher optimization levels, we generate the libcall for DIVREM_Ix, which is
fine: aeabi_{u|i}divmod. At -O0 we generate the one for REM_Ix, which is the
default {u}mod{q|h|s|d}i3.

This commit makes sure that we don't generate REM_Ix calls for ABIs that
don't support them (i.e. where we need to use DIVREM_Ix instead). This is
achieved by bailing out of FastISel, which can't handle non-double multi-reg
returns, and letting the legalization infrastructure expand the REM_Ix calls.

It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some
Windows checks to it to make sure we don't break things for it.

Fixes PR27068

Differential Revision: https://reviews.llvm.org/D21926

llvm-svn: 275773
2016-07-18 06:48:25 +00:00
Craig Topper 1af6cc00dc [X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275769
2016-07-18 06:14:54 +00:00
Craig Topper ba9b93d7f2 [X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275768
2016-07-18 06:14:50 +00:00
Craig Topper 3a99de4067 [X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275767
2016-07-18 06:14:47 +00:00
Craig Topper f7a06c29bc [X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr.
llvm-svn: 275765
2016-07-18 06:14:43 +00:00
Craig Topper 650a15e2b3 [X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related.
llvm-svn: 275764
2016-07-18 06:14:39 +00:00
Craig Topper 5c913e84df [AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported.
Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step.

llvm-svn: 275763
2016-07-18 06:14:34 +00:00
Simon Pilgrim 5aa90c55b6 [X86][AVX] Added VBROADCASTF128/VBROADCASTI128 tests
llvm-svn: 275713
2016-07-17 17:44:18 +00:00
Simon Pilgrim d1e941ae85 [X86] Regenerated ctlz/cttz scalar tests for 32/64-bit targets with/without LZCNT/TZCNT support
llvm-svn: 275710
2016-07-17 16:15:51 +00:00
Simon Pilgrim 0bf66c9d62 [X86] Regenerated popcnt scalar tests for 32/64-bit targets with/without POPCNT support
llvm-svn: 275709
2016-07-17 16:04:19 +00:00
Elena Demikhovsky eaa356501d X86: Updated a test file. NFC.
This test shows subotimal code generated for AVX-512 vs PENTIUM4.
The issue will be fixed in an upcomming commit.

llvm-svn: 275702
2016-07-17 07:03:13 +00:00
Hal Finkel 04b5330ccd Disable this-return argument forwarding on ARM/AArch64
r275042 reverted function-attribute inference for the 'returned' attribute
because the feature triggered self-hosting failures on ARM and AArch64. James
Molloy determined that the this-return argument forwarding feature, which
directly ties the returned input argument to the returned value, was the cause.
It seems likely that this forwarding code contains, or triggers, a subtle bug.
Disabling for now until we can track that down.

llvm-svn: 275677
2016-07-16 07:07:29 +00:00
Yaxun Liu a711cc7951 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.

llvm-svn: 275676
2016-07-16 05:09:21 +00:00
Matthias Braun 538859cca3 llc: Add support for -run-pass none
This does not schedule any passes besides the ones necessary to
construct and print the machine function. This is useful to test .mir
file reading and printing.

Differential Revision: http://reviews.llvm.org/D22432

llvm-svn: 275664
2016-07-16 02:24:59 +00:00
Matthias Braun c92a5fc9f6 ARM/MIR: Move test from MIR to CodeGen/ARM directory
test/CodeGen/MIR/ARM/ARMLoadStoreDBG.mir is an actual test for the ARM
load store optimization pass and not a test of the mir parser/printer.

It belongs to test/CodeGen/ARM; This also updates the test to use the
new -run-pass llc syntax.

llvm-svn: 275662
2016-07-16 02:24:13 +00:00
Matthias Braun 5d00b3213e MIParser: reject subregister indexes on physregs
llvm-svn: 275658
2016-07-16 01:36:18 +00:00
Matt Arsenault 73d2f8954a AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

llvm-svn: 275635
2016-07-15 22:32:02 +00:00
Michael Kuperstein be2e3f5ce5 ExpandPostRAPseudos should transfer implicit uses, not only implicit defs
Previously, we would expand:
%BL<def> = COPY %DL<kill>, %EBX<imp-use,kill>, %EBX<imp-def>
Into:
%BL<def> = MOV8rr %DL<kill>, %EBX<imp-def>
Dropping the imp-use on the floor.

That confused CriticalAntiDepBreaker, which (correctly) assumes that if an
instruction defs but doesn't use a register, that register is dead immediately
before the instruction - while in this case, the high lanes of EBX can be very
much alive.

This fixes PR28560.

Differential Revision: https://reviews.llvm.org/D22425

llvm-svn: 275634
2016-07-15 22:31:14 +00:00
Matt Arsenault a65e6b8335 AMDGPU: Remove brev intrinsic
llvm-svn: 275620
2016-07-15 21:27:13 +00:00
Matt Arsenault 82e5e1e564 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
llvm-svn: 275619
2016-07-15 21:27:08 +00:00
Matt Arsenault 11d3e21f2b AMDGPU: Remove AMDGPU.ldexp
llvm-svn: 275618
2016-07-15 21:26:56 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Saleem Abdulrasool 467269a40e CodeGen: avoid emitting unnecessary CFI
Remove unnecessary clutter in assembly output.  When using SjLj EH, the CFI is
not actually used for anything.  Do not emit the CFI needlessly.  The minor test
adjustments are interesting.  The prologue test was just overzealous matcching.
The interesting case is the LSDA change.  It was originally added to ensure that
various compilations did not mangle the name (it explicitly checked the name!).
However, subsequent cleanups made it more reliant on the CFI to find the name.
Parse the generated code flow to generically find the label still.

llvm-svn: 275614
2016-07-15 21:10:29 +00:00
Nico Weber 8d66df15f4 Teach fast isel about the win64 calling convention.
This mostly just works.

Vectorcall rets are still not supported.

The win64_eh test change is because fast isel doesn't use rsi for temporary
computations, so it doesn't need to be pushed. The test case I'm changing was
originally added to test pushes, but by now there are other test cases in that
file exercising that code path.

https://reviews.llvm.org/D22422

llvm-svn: 275607
2016-07-15 20:18:37 +00:00
Vitaly Buka 7f64844481 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

llvm-svn: 275599
2016-07-15 19:14:57 +00:00
Krzysztof Parzyszek bba0bf7d37 [Hexagon] Improve patterns with stack-based addressing
- Treat bitwise OR with a frame index as an ADD wherever possible, fold it
  into addressing mode.
- Extend patterns for memops to allow memops with frame indexes as address
  operands.

llvm-svn: 275569
2016-07-15 15:35:52 +00:00
Nico Weber f7f2b81602 In dag-optnone.ll, use varargs instead of win64 to fast SDIsel.
The test used to rely on targeting win64 to disable fast isel,
but I'd like to teach fast isel about win64 rets.  Change the
test to use varargs to disable fast isel.

llvm-svn: 275568
2016-07-15 15:30:18 +00:00
Yaxun Liu b3d17690eb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

llvm-svn: 275566
2016-07-15 14:58:21 +00:00
Simon Pilgrim efd841e294 [X86][AVX] Added shuffle tests for UNPCK+PERMUTE
lowerVectorShuffleAsPermuteAndUnpack could solve this if it worked with 256-bit vectors

llvm-svn: 275554
2016-07-15 11:51:46 +00:00
Simon Pilgrim cf9c31550c [X86][AVX2] Added a memory version of test_mm256_broadcastsi128_si256
This should lower to vbroadcasti128

llvm-svn: 275552
2016-07-15 11:40:27 +00:00
Simon Pilgrim 2683ad54ad [X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes
As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.

This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.

Fixes PR28136.

llvm-svn: 275543
2016-07-15 09:49:12 +00:00
James Molloy b3326df56a [Thumb-1] Select post-increment load and store where possible
Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store:

  ldm r0!, {r1}  @ load from r0 into r1 and increment r0 by 4

Obviously, this only works if the post increment is 4.

llvm-svn: 275540
2016-07-15 08:03:56 +00:00
James Molloy a454a11d60 [ARM] Prefer indirect calls in minsize mode
... When we emit several calls to the same function in the same basic block.

An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions.

llvm-svn: 275537
2016-07-15 07:55:21 +00:00
Matt Arsenault b91805ea2b AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510
2016-07-15 00:58:15 +00:00
Matt Arsenault fa5a86a403 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

llvm-svn: 275509
2016-07-15 00:58:13 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00
Simon Pilgrim 420b266d0a [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied)
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

This was incorrectly reverted in rL275421 during triage of PR28552.

llvm-svn: 275497
2016-07-14 23:05:09 +00:00
Krzysztof Parzyszek ecea07c50e [Hexagon] Packetize function call arguments with tail call instructions
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.

llvm-svn: 275458
2016-07-14 19:30:55 +00:00
Sanjay Patel 2996a342f3 auto-generate checks
Note: I removed the checks after each jump because that's noise, but we apparently 
need branches rather than returning i1 to see the bt codegen in some cases.

llvm-svn: 275439
2016-07-14 17:07:55 +00:00
Nico Weber 5bb284226b Don't optimize movs to pushes in -O0 builds.
https://reviews.llvm.org/D22362

llvm-svn: 275431
2016-07-14 15:40:22 +00:00
Nico Weber 3afaf16abc Revert r275411, it cause PR28552.
llvm-svn: 275421
2016-07-14 14:49:35 +00:00
Nico Weber ecdf45b1e6 Teach fast isel calls and rets about stdcall.
stdcall is callee-pop like thiscall, so the thiscall changes already did most
of the work for this.  This change only opts stdcall in and adds tests.

llvm-svn: 275414
2016-07-14 13:54:26 +00:00
Simon Pilgrim bed37ccd54 [X86][AVX] Added an additional vperm2f128 memory folding test
llvm-svn: 275413
2016-07-14 13:40:53 +00:00
Simon Pilgrim 3ecb6bdd5f [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

llvm-svn: 275411
2016-07-14 13:28:43 +00:00
Daniel Sanders 46fe6550ac [mips] SelectionDAGISel subclasses now follow the optimization level.
Summary:
It was recently discovered that, for Mips's SelectionDAGISel subclasses,
all optimization levels caused SelectionDAGISel to behave like -O2.

This change adds the necessary plumbing to initialize the optimization level.

Reviewers: andrew.w.kaylor

Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders

Differential Revision: https://reviews.llvm.org/D14900

llvm-svn: 275410
2016-07-14 13:25:22 +00:00
Simon Pilgrim 053d32906f [X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.

llvm-svn: 275406
2016-07-14 12:58:04 +00:00
Simon Pilgrim 700e4a1ab8 [X86][AVX] Add 128-bit wide shuffle tests that should combine to blend-with-zero
llvm-svn: 275402
2016-07-14 12:21:40 +00:00
Simon Pilgrim a76a8e50e5 [X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support
llvm-svn: 275400
2016-07-14 12:07:43 +00:00
Simon Pilgrim 9e812169cc [X86][AVX] Regenerate broadcast upgrade tests
llvm-svn: 275398
2016-07-14 11:05:43 +00:00
Eli Friedman 17e8ea18e9 [X86] Fix stupid typo in isel lowering.
Apparently someone miscounted the number of zeros in the immediate.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 .

llvm-svn: 275376
2016-07-14 05:48:25 +00:00
Matt Arsenault ca7f5701f8 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
2016-07-14 05:47:17 +00:00
Matt Arsenault 897eee4187 AMDGPU: Remove unused intrinsics
llvm-svn: 275371
2016-07-14 05:23:19 +00:00
Matt Arsenault aa94c1e7ee AMDGPU: Fix test not actually testing anything
It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.

Also add missing test for lifetime markers.

llvm-svn: 275370
2016-07-14 05:23:15 +00:00
Dean Michael Berris 52735fc435 XRay: Add entry and exit sleds
Summary:
In this patch we implement the following parts of XRay:

- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.

There are some caveats here:

1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.

2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.

Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk

Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D19904

llvm-svn: 275367
2016-07-14 04:06:33 +00:00
Nico Weber af7e8465e1 Teach fast isel about thiscall (and callee-pop) calls.
http://reviews.llvm.org/D22315

llvm-svn: 275360
2016-07-14 01:52:51 +00:00
Mehdi Amini 9e332a7719 Add missing test for r275347 "[IPRA] Set callee saved registers to none for local function when IPRA is enabled."
llvm-svn: 275358
2016-07-14 01:31:20 +00:00
Michael Kuperstein be837fa40f [DAG] Correctly chain masked loads
If a masked loads is not added to the chain, it should not reset the chain's
root.

This fixes the remaining part of PR28515.

llvm-svn: 275340
2016-07-13 23:23:40 +00:00
Quentin Colombet 68a84587c5 [MIR] Fix one GlobalISel test case that I missed in r275314.
llvm-svn: 275333
2016-07-13 22:35:33 +00:00
Nico Weber b888555bcc Add a triple to fix test on bots after 275320.
llvm-svn: 275327
2016-07-13 22:19:40 +00:00
Nico Weber eb9488b151 Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact.
This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well,
but it's not clear if the pass should run in that setup.

http://reviews.llvm.org/D22314

llvm-svn: 275320
2016-07-13 21:38:27 +00:00
Quentin Colombet 545e558b82 [MIR] Print on the given output instead of stderr.
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.

This patch fixes that by printing the regular output in the output
specified with -o.

Differential Revision: http://reviews.llvm.org/D22251

llvm-svn: 275314
2016-07-13 20:36:03 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Simon Pilgrim 5d664af3c3 [X86][SSE] Regenerate truncated shift test
Check SSE2 and AVX2 implementations

llvm-svn: 275300
2016-07-13 18:50:10 +00:00
Simon Pilgrim 631643e7d9 Regenerate test
llvm-svn: 275299
2016-07-13 18:46:37 +00:00
Krzysztof Parzyszek cb4dd7656b Move mempcpy_call.ll to X86 subdirectory
llvm-svn: 275294
2016-07-13 18:28:45 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
Matthias Braun 512424f28a PatchableFunction: Skip pseudos that do not create code
This fixes http://llvm.org/PR28524

llvm-svn: 275278
2016-07-13 16:37:29 +00:00
Sanjay Patel 610a2f6525 [x86][SSE/AVX] optimize pcmp results better (PR28484)
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Differential Revision: http://reviews.llvm.org/D22225

llvm-svn: 275276
2016-07-13 16:04:07 +00:00
Simon Pilgrim a99368fa35 [X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments
llvm-svn: 275272
2016-07-13 15:45:36 +00:00
Simon Pilgrim 48d8340760 [X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask
Added AVX512F VPERMILPS shuffle decoding support

llvm-svn: 275270
2016-07-13 15:10:43 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Tim Northover 72eebfa4b0 GlobalISel: freeze reserved regs after IRTranslator.
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.

Fixes -verify-machineinstrs assertion.

llvm-svn: 275221
2016-07-12 22:23:42 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Nemanja Ivanovic f0407e3902 The test case I added is PowerPC specific but I accidentally
had it in the wrong directory. Moved it to CodeGen/PowerPC.

Sorry about the noise.

llvm-svn: 275218
2016-07-12 21:24:08 +00:00
Nemanja Ivanovic b43bb6141e [Power9] Add codegen for VSX word insert/extract instructions
This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215
2016-07-12 21:00:10 +00:00
Simon Pilgrim 6fa71da4a4 [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128
llvm-svn: 275212
2016-07-12 20:27:32 +00:00
Matthias Braun 96ec47db74 X86FixupBWInsts: No need for forward liveness analysis.
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.

Differential Revision: http://reviews.llvm.org/D22083

llvm-svn: 275204
2016-07-12 19:04:30 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Haicheng Wu 711ca868fc [AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed.
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.

Differential Revision: http://reviews.llvm.org/D22256

llvm-svn: 275178
2016-07-12 15:31:41 +00:00
Nemanja Ivanovic eebbcb6d57 [PowerPC] Cannonicalize applicable vector shift immediates as swaps
This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168
2016-07-12 12:16:27 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Craig Topper a6e6febe2c [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.
llvm-svn: 275155
2016-07-12 05:27:53 +00:00
NAKAMURA Takumi e92e2124f6 llvm/test/CodeGen/AMDGPU/selected-stack-object.ll REQUIRES +Asserts, since it expects assertion failure.
llvm-svn: 275144
2016-07-12 02:18:09 +00:00
Haicheng Wu 1e39574e9f [Kryo] Enable ZCZeroing feature
This feature uses immediate #0 to zero a register.

Differential Revision: http://reviews.llvm.org/D19985

llvm-svn: 275143
2016-07-12 02:04:01 +00:00
Nico Weber c7bf646a99 Teach FastISel about thiscall (and, hence, about callee-pop).
http://reviews.llvm.org/D22115

llvm-svn: 275135
2016-07-12 01:30:35 +00:00
Matt Arsenault 45f8216cee AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Quentin Colombet fb82c7bc94 [X86] Fix tailcall return address clobber bug.
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call  sequence to be generated in two passes instead of one.

Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:

popl    %edi
popl    %edi
pushl   %eax
jmp     tailcallee              # TAILCALL

To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.

Patch by Magnus Lång <margnus1@gmail.com>

Differential Revision: http://reviews.llvm.org/D21325

llvm-svn: 275103
2016-07-11 21:03:03 +00:00
Michael Kuperstein cfbac5f361 [X86] Disable FixupSetCC for CodeGenOpt::None
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.

This also works around (but doesn't quite fix) PR28489.

llvm-svn: 275099
2016-07-11 20:40:44 +00:00
Chad Rosier 4f0dad1674 [IPRA] Properly compute register usage at call sites.
Differential Revision: http://reviews.llvm.org/D21395
Patch by Vivek Pandya.
PR28144

llvm-svn: 275087
2016-07-11 18:45:49 +00:00
Zhan Jun Liau def708a0f9 [SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunities
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values.  Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.

Author: RolandF

Reviewers: uweigand

Subscribers: zhanjunl

Commiting on behalf of Roland.

Differential Revision: http://reviews.llvm.org/D22117

llvm-svn: 275086
2016-07-11 18:45:03 +00:00
Sanjay Patel 8f1d408c74 [x86] make some of the tests 256-bit for testing diversity
llvm-svn: 275070
2016-07-11 15:08:37 +00:00
Sanjay Patel b428951990 [x86] specify triple to avoid bot failures
llvm-svn: 275067
2016-07-11 14:17:54 +00:00
Sanjay Patel 0d38830aca [x86] update checks
llvm-svn: 275064
2016-07-11 14:07:31 +00:00
Zlatko Buljan cba9f80ba8 [mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and SWC2 instructions and add CodeGen support
Differential Revision: http://reviews.llvm.org/D18824

llvm-svn: 275050
2016-07-11 07:41:56 +00:00
Elena Demikhovsky d84f337953 AVX-512: DAG lowering for scalar MIN/MAX commutable ops
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.

llvm-svn: 275048
2016-07-11 06:08:06 +00:00
Craig Topper 7ee070e7bc [AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching.
llvm-svn: 275046
2016-07-11 05:36:53 +00:00
Craig Topper 516e14cd8e [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors.
llvm-svn: 275045
2016-07-11 05:36:48 +00:00
Jan Vesely 2fa28c330c AMDGPU/R600: Add implicitarg.ptr intrinsic
Differential Revision: http://reviews.llvm.org/D21622

llvm-svn: 275024
2016-07-10 21:20:29 +00:00
Simon Pilgrim 2191faa433 [X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW
llvm-svn: 275022
2016-07-10 21:02:47 +00:00
Sanjay Patel ccd08fc8c4 [x86, SSE, AVX] add tests for icmp+zext (PR28484)
Note the inconsistent vpbroadcast generation for AVX2; another bug.

llvm-svn: 275020
2016-07-10 20:45:14 +00:00
Simon Pilgrim 51c786bd91 [X86][SSE] Added tests for combining shuffles to PSHUFLW/PSHUFHW
llvm-svn: 275019
2016-07-10 20:19:56 +00:00
Marcin Koscielnicki cf7cc724a7 [SystemZ] Utilize Test Data Class instructions.
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions.  A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.

Differential Revision: http://reviews.llvm.org/D21949

llvm-svn: 275016
2016-07-10 14:41:22 +00:00
Craig Topper 0b0954570a [AVX512] Add support for lowering to 512-bit SHUFPS.
llvm-svn: 275011
2016-07-10 05:55:53 +00:00
Simon Pilgrim 606126e848 [X86][SSE] Add support for target shuffle combining to INSERTPS
llvm-svn: 274990
2016-07-09 21:47:55 +00:00
Simon Pilgrim 890b415902 [X86][SSE] Regenerate vector shift tests
llvm-svn: 274987
2016-07-09 20:55:20 +00:00
Matt Arsenault c1e6a45f2e AMDGPU: Merge / reorganize tests
llvm-svn: 274972
2016-07-09 08:02:28 +00:00
Matt Arsenault b2cb5f8105 AMDGPU: Simplify tests with per function subtargets
llvm-svn: 274971
2016-07-09 07:55:03 +00:00
Matt Arsenault dfec5ce032 AMDGPU: Fix fdiv lowering when f32 denormals supported
Also fix test not actually using function labels.

llvm-svn: 274969
2016-07-09 07:48:11 +00:00
Craig Topper 70610cf7b6 [X86] Remove and autoupgrade 512-bit non-temporal store intrinsics.
llvm-svn: 274966
2016-07-09 04:38:27 +00:00
Matt Arsenault 1322b6f8bb AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
2016-07-09 01:13:56 +00:00
Matthias Braun 152e7c8b12 VirtRegMap: Replace some identity copies with KILL instructions.
An identity COPY like this:
   %AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.

Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).

llvm-svn: 274952
2016-07-09 00:19:07 +00:00
Jacques Pienaar 9e70127b0a [lanai] Update test to use peephole-opt and not peephole-opts
llvm-svn: 274945
2016-07-08 22:28:29 +00:00
Matt Arsenault 3fb8f9eabf Reapply r274829 with fix for FP vectors
llvm-svn: 274937
2016-07-08 21:25:33 +00:00
Nico Weber 28410c6846 Revert r274829, it caused PR28472.
llvm-svn: 274916
2016-07-08 19:52:19 +00:00
Simon Pilgrim 0a0e0d4e8e [X86] Regenerated bitreverse tests to demonstrate what is going on.
llvm-svn: 274915
2016-07-08 19:51:08 +00:00
Simon Pilgrim aaaeedb8cb [X86] Added bitreverse tests for non-legal types
Requested on D21578

llvm-svn: 274914
2016-07-08 19:48:33 +00:00
Simon Pilgrim 950419f948 [X86][AVX2] Add support for target shuffle combining to VPERMPD/VPERMQ
llvm-svn: 274908
2016-07-08 19:23:29 +00:00
Simon Pilgrim b600ba3b79 [X86][AVX] Added combine test that should simplify to insertps
llvm-svn: 274884
2016-07-08 17:01:42 +00:00
Matt Arsenault 44540a3db2 PeepholeOptimizer: Make pass name match DEBUG_TYPE
llvm-svn: 274874
2016-07-08 16:29:11 +00:00
Chris Dewhurst 3202f065b8 [Sparc] Leon errata fix passes.
Errata fixes for various errata in different versions of the Leon variants of the Sparc 32 bit processor.

The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.

Note: Running clang-format has changed a few other lines too, unrelated to the implemented errata fixes. These have been left in as this keeps the code formatting consistent.

Differential Revision: http://reviews.llvm.org/D21960

llvm-svn: 274856
2016-07-08 15:33:56 +00:00
Sjoerd Meijer 1ee119f897 Do not expand SDIV when compiling for minimum code size
Differential Revision: http://reviews.llvm.org/D22139

llvm-svn: 274855
2016-07-08 15:32:01 +00:00
Sjoerd Meijer 46c4c3d31c Addressing post-commit comments regarding not expanding UDIV;
we don't expand only when compiling for minimum code size.

llvm-svn: 274847
2016-07-08 14:17:09 +00:00
Sjoerd Meijer a625af3feb Code size optimisation: don't expand a div to a mul and and a shift sequence.
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.

Differential Revision: http://reviews.llvm.org/D22131

llvm-svn: 274843
2016-07-08 12:54:43 +00:00
Simon Pilgrim 828c731880 [X86][SSE] Accept any shuffle mask that is all zeroes
Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes.

llvm-svn: 274833
2016-07-08 10:39:12 +00:00
Matt Arsenault c3a6fe6ecd Bug 28444: Fix assertion when extract_vector_elt has mismatched type
For some reason extract_vector_elt is sometimes allowed to have
a different result type than the vector element type.

llvm-svn: 274829
2016-07-08 07:05:00 +00:00
Craig Topper f7bf6de0af [AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift intrinsics.
I'm not sure if clang ever used these builtin names or not.

llvm-svn: 274827
2016-07-08 06:14:47 +00:00
Wei Mi 90d195a5fd [PM] Port UnreachableBlockElim to the new Pass Manager
Differential Revision: http://reviews.llvm.org/D22124

llvm-svn: 274824
2016-07-08 03:32:49 +00:00
Saleem Abdulrasool eb059b0e0a ARM: support high registers in __builtin_longjmp on WoA
Windows on ARM uses a pure thumb-2 environment.  This means that it can select a
high register when doing a __builtin_longjmp.  We would use a tLDRi which would
truncate the register to a low register.  Use a t2LDRi12 to get the full
register file access.  Tweak the code to just load into PC, as that is an
interworking branch on all supported cores anyways.

llvm-svn: 274815
2016-07-08 00:48:22 +00:00
Jacques Pienaar 6d3eecc843 [lanai] Use peephole optimizer to generate more conditional ALU operations.
Summary:
* Similiar to the ARM backend yse the peephole optimizer to generate more conditional ALU operations;
* Add predicated type with default always true to RR instructions in LanaiInstrInfo.td;
* Move LanaiSetflagAluCombiner into optimizeCompare;
* The ASM parser can currently only handle explicitly specified CC, so specify ".t" (true) where needed in the ASM test;
* Remove unused MachineOperand flags;

Reviewers: eliben

Subscribers: aemerson

Differential Revision: http://reviews.llvm.org/D22072

llvm-svn: 274807
2016-07-07 23:36:04 +00:00
Michael Kuperstein 3e3652aef2 Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.

The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.

llvm-svn: 274802
2016-07-07 22:50:23 +00:00
Chad Rosier 112d0e996b [AArch64] Change the preferred alignment for char and short to word alignment.
The commit reinstates r273279, which was informally approved.

Original Review: http://reviews.llvm.org/D21414

This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1.

llvm-svn: 274790
2016-07-07 20:02:18 +00:00
Tim Northover 1d106c5fc2 tests: accept different TargetOpcode values.
These tests don't actually care about the internal opcode number, but have to
be updated whenever we add a new one for GlobalISel. That's bad.

llvm-svn: 274774
2016-07-07 17:51:42 +00:00
Michael Kuperstein edb38a94f8 Revert r274692 to check whether this is what breaks windows selfhost.
llvm-svn: 274771
2016-07-07 16:55:35 +00:00
Justin Bogner a466cc33fa NVPTX: Remove the legacy ptx intrinsics
- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
  not all of these registers were already accessible via the nvvm
  name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.

There's a fair amount of code motion here, but it's all very
mechanical.

llvm-svn: 274769
2016-07-07 16:40:17 +00:00
Chad Rosier 3972953efd Revert "[AArch64] Change the preferred alignment for char and short to word alignment"
This reverts commit r273279 as the change was not properly approved.

llvm-svn: 274768
2016-07-07 16:37:29 +00:00
Craig Topper d5d2a35013 [AVX512] Zero extend the result of vpcmpeq/vpcmpgt and similar intrinsics in the autoupgrade code. This currently results in worse codegen but is needed for correctness.
llvm-svn: 274736
2016-07-07 06:11:07 +00:00
Manman Ren 524ca27b90 Add testing coverage for r274582.
llvm-svn: 274693
2016-07-06 22:01:28 +00:00
Michael Kuperstein 1ef6c59b1d [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.

This fixes PR28146.

Differential Revision: http://reviews.llvm.org/D21774

llvm-svn: 274692
2016-07-06 21:56:18 +00:00
Matthias Braun ad0032a649 AArch64: Change modeling of zero cycle zeroing.
On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should
be used to zero a vector register. This was previously done at
instruction selection time, however the register coalescer sometimes
widened multiple vregs to the Q width because of that leading to extra
spills. This patch leaves the decision on how to zero a register to the
AsmPrinter phase where it doesn't affect register allocation anymore.

This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0.

This fixes http://llvm.org/PR27454, rdar://25866262

Differential Revision: http://reviews.llvm.org/D21826

llvm-svn: 274686
2016-07-06 21:39:33 +00:00
Justin Lebar 6f9d01bbd5 [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D22068

llvm-svn: 274674
2016-07-06 21:06:10 +00:00
Justin Bogner a463537a36 NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0
Everywhere where cuda.syncthreads or __syncthreads is used, use the
properly namespaced nvvm.barrier0 instead.

llvm-svn: 274664
2016-07-06 20:02:45 +00:00
Elliot Colp bc2cfc2291 [SystemZ] Remove AND mask of bottom 6 bits when result is used for shift/rotate
On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount.
Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we
can remove the AND operation entirely.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 274650
2016-07-06 18:13:11 +00:00
Kit Barton f9d0a40573 Ensure all uses of permute instructions feed vector stores
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.

The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.

This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).

Test case based on the original problem reported.

Phabricator Review: http://reviews.llvm.org/D21802

llvm-svn: 274645
2016-07-06 18:03:52 +00:00
Tim Shen 1c3c0afc53 [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it.
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.

This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.

Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D21692

llvm-svn: 274644
2016-07-06 17:44:03 +00:00
Simon Pilgrim 118da63a9d [X86][SSE] Added test cases for missed opportunities to combine pshufb to pslldq/psrldq
llvm-svn: 274631
2016-07-06 15:09:48 +00:00
Elena Demikhovsky ad0a56f3da Re-commit of 274613.
The prev commit failed on compilation.
A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure.

llvm-svn: 274626
2016-07-06 14:15:43 +00:00
Diana Picus b772e409ba [ARM] Do not test for CPUs, use SubtargetFeatures. Also remove 2 flags.
This is a follow-up for r273544.

The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.

This commit also removes two command-line flags that weren't used in any of the
tests: widen-vmovs and swift-partial-update-clearance. The former may be easily
replaced with the mattr mechanism, but the latter may not (as it is a subtarget
property, and not a proper feature).

Differential Revision: http://reviews.llvm.org/D21797

llvm-svn: 274620
2016-07-06 11:22:11 +00:00
Elena Demikhovsky 02ced295aa Reverted 274613 due to compilation failue.
llvm-svn: 274615
2016-07-06 09:11:49 +00:00
Elena Demikhovsky 5a4f2476fd AVX-512: Optimization for patterns with i1 scalar type
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.

This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.

Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).

Differential revision: http://reviews.llvm.org/D21956

llvm-svn: 274613
2016-07-06 09:01:20 +00:00
Nicolai Haehnle e40530ea7b AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612
2016-07-06 08:35:17 +00:00
Tim Northover e6ae6767d9 AArch64: TableGenerate system instruction operands.
The way the named arguments for various system instructions are handled at the
moment has a few problems:

  - Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
  - That weird Mapping class that I have no idea what I was on when I thought
    it was a good idea.
  - Searches are performed linearly through the entire list.
  - We print absolutely all registers in upper-case, even though some are
    canonically mixed case (SPSel for example).
  - The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
    to comments in our implementation, with a slightly opaque hex value
    indicating the canonical encoding LLVM will use.

This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.

llvm-svn: 274576
2016-07-05 21:23:04 +00:00
Balaram Makam d4acd7ed10 Revert r259387: "AArch64: Implement missed conditional compare sequences."
This reverts commit r259387 because it inserts illegal code after legalization
    in some backends where i64 OR type is illegal for example.

llvm-svn: 274573
2016-07-05 20:24:05 +00:00
Simon Pilgrim bec6543d17 [X86][AVX2] Add support for target shuffle combining to BROADCAST
Only support broadcast from vector register so far - memory folding support will have to wait.

llvm-svn: 274572
2016-07-05 20:11:29 +00:00
Simon Pilgrim 48adedffb7 [X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining
Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs).

llvm-svn: 274571
2016-07-05 18:31:17 +00:00
Saleem Abdulrasool 4d950ef892 ARM: fix `-mlong-calls` for WoA
Not all code-paths set the relocation model to static for Windows.  This
currently breaks on Windows ARM with `-mlong-calls` when built with clang.
Loosen the assertion to what it was previously.  We would ideally ensure that
all the configuration sets Windows to static relocation model.

llvm-svn: 274570
2016-07-05 18:30:52 +00:00
Matt Arsenault 2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Tim Northover 01dff9d18a AArch64: use correct SDValue # when looking for bitfield placement.
The other use really does only care about the SDNode (it checks the
opcode against a whitelist), but bitFieldPlacement can be misled if
the node produces multiple results.

Patch by Ismail Badawi.

llvm-svn: 274567
2016-07-05 18:02:57 +00:00
Matt Arsenault ffc8275f2b AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Simon Pilgrim 4e96fbf3c1 [X86][AVX512] Autoupgrade the BROADCAST intrinsics
llvm-svn: 274550
2016-07-05 13:58:47 +00:00
Simon Pilgrim 1e91654b38 [X86][AVX512BW] Added BROADCAST intrinsics fast-isel generic IR tests
llvm-svn: 274545
2016-07-05 13:16:05 +00:00
James Molloy ae5ff990ae [Thumb] Reapply r272251 with a fix for PR28348 (mk 2)
The important thing I was missing was ensuring newly added constants were kept in topological order. Repositioning the node is correct if the constant is newly added (so it has no topological ordering) but wrong if it already existed - positioning it next in the worklist would break the topological ordering.

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

llvm-svn: 274543
2016-07-05 12:37:13 +00:00
Simon Pilgrim 20ede63a33 [X86][AVX512] Added BROADCAST intrinsics fast-isel generic IR tests
llvm-svn: 274537
2016-07-05 10:15:14 +00:00
Nemanja Ivanovic 44513e545f [PowerPC] - Legalize vector types by widening instead of integer promotion
This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535
2016-07-05 09:22:29 +00:00
Simon Pilgrim dea33cc2f3 [X86][AVX512] Added VSHUFPD intrinsics fast-isel generic IR tests
llvm-svn: 274534
2016-07-05 09:10:07 +00:00
Simon Pilgrim 8a01915bd2 [X86][AVX512VL] Added VSHUFPD/VSHUFPS intrinsics fast-isel generic IR tests
llvm-svn: 274533
2016-07-05 09:09:41 +00:00
Simon Pilgrim 3ad040909a [X86][AVX512] Add support for lowering shuffles to VSHUFPD
llvm-svn: 274520
2016-07-04 20:41:24 +00:00
James Molloy c3b4ed4a70 Revert "[Thumb] Reapply r272251 with a fix for PR28348"
This reverts commit r274510 - it made green dragon unhappy.

llvm-svn: 274512
2016-07-04 17:14:24 +00:00
James Molloy 9f019835ef [Thumb] Reapply r272251 with a fix for PR28348
We were using DAG->getConstant instead of DAG->getTargetConstant. This meant that we could inadvertently increase the use count of a constant if stars aligned, which it did in this testcase. Increasing the use count of the constant could cause ISel to fall over (because DAGToDAG lowering assumed the constant had only one use!)

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

llvm-svn: 274510
2016-07-04 16:35:41 +00:00
Simon Pilgrim 02d435d2f4 [X86][AVX512] Autoupgrade the VPERMPD/VPERMQ intrinsics
llvm-svn: 274506
2016-07-04 14:19:05 +00:00
Simon Pilgrim 8b82fce537 [X86][AVX512] Added VPERMPD/VPERMQ intrinsics fast-isel generic IR tests
llvm-svn: 274503
2016-07-04 13:43:10 +00:00
Simon Pilgrim 9fca300cbe [X86][AVX512] Autoupgrade the VPERMILPD/VPERMILPS intrinsics
llvm-svn: 274498
2016-07-04 12:40:54 +00:00
Simon Pilgrim c8cf2ddb6d [X86][AVX512] Added VPERMILPD/VPERMILPS intrinsics fast-isel generic IR tests
Added PSHUFD tests as well

llvm-svn: 274493
2016-07-04 11:07:50 +00:00
Craig Topper d83f818a3e [CodeGen] Make the code that detects a if a shuffle is really a concatenation of the inputs more general purpose.
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.

This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.

llvm-svn: 274483
2016-07-04 06:19:35 +00:00
Simon Pilgrim 7f096de0b8 [X86][AVX512] Add support for 512-bit shuffle lowering to VPERMPD/VPERMQ
llvm-svn: 274473
2016-07-03 19:50:06 +00:00
Craig Topper d1eca0f32c [CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices.
Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef.

llvm-svn: 274472
2016-07-03 19:37:12 +00:00
Craig Topper 8e826d5abe [X86] Add tests to show that the DAG combine for OR of shuffles with zero vectors doesn't handle undefs as well as it could. Fix coming in another commit.
llvm-svn: 274471
2016-07-03 19:37:10 +00:00
Haicheng Wu b71b2f622a [MBB] add a missing corner case in UpdateTerminator()
After the block placement, if a block ends with a conditional branch, but the
next block is not its successor. The conditional branch should be changed to
unconditional branch.  This patch fixes PR28307, PR28297, PR28402.

Differential Revision: http://reviews.llvm.org/D21811

llvm-svn: 274470
2016-07-03 19:14:17 +00:00
Simon Pilgrim 68ea80649b [X86][AVX512] Add support for VPERMPD/VPERMQ masked shuffle comments
llvm-svn: 274469
2016-07-03 18:40:24 +00:00
Simon Pilgrim a0d73835b2 [X86][AVX512] Add support for 512-bit shuffle decoding of VPERMPD/VPERMQ
llvm-svn: 274468
2016-07-03 18:27:37 +00:00
Simon Pilgrim dbd6db0dc7 [X86][AVX512] Add support for VPALIGNR/PSHUFD/PSHUFHW/PSHUFLW masked shuffle comments
llvm-svn: 274466
2016-07-03 15:00:51 +00:00
Simon Pilgrim 598bdb6bfe [X86][AVX512] Add support for UNPCK masked shuffle comments
llvm-svn: 274464
2016-07-03 14:26:21 +00:00
Simon Pilgrim 1f59076196 [X86][AVX512] Add support for VPERM/VSHUF masked shuffle comments
llvm-svn: 274462
2016-07-03 13:55:41 +00:00
Simon Pilgrim 68f438a036 [X86][AVX512] Add support for PMOVZX masked shuffle comments
llvm-svn: 274461
2016-07-03 13:33:28 +00:00
Simon Pilgrim 7c2fbdc101 [X86][AVX512] Add support for masked shuffle comments
This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments.

This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course.

Differential Revision: http://reviews.llvm.org/D21953

llvm-svn: 274459
2016-07-03 13:08:29 +00:00
Simon Pilgrim 129b720c18 [X86][AVX512] Add support for lowering shuffles to VPERMILPS
llvm-svn: 274458
2016-07-03 12:47:21 +00:00
Simon Pilgrim 99e8a1aa0b [X86][AVX512] Add support for lowering shuffles to VPERMILPD
llvm-svn: 274450
2016-07-02 20:20:12 +00:00
Simon Pilgrim 72052f6de9 [X86][AVX512VL] Add fast-isel MOVDDUP/MOVSLDUP/MOVSHDUP shuffle tests
llvm-svn: 274448
2016-07-02 19:22:46 +00:00
Simon Pilgrim cde7c54baa [X86][AVX512] Add support for 512-bit PSHUFB lowering
llvm-svn: 274444
2016-07-02 18:14:31 +00:00
Simon Pilgrim 77dda7c2e0 [X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR
llvm-svn: 274443
2016-07-02 17:16:41 +00:00
Simon Pilgrim 19adee9d84 [X86][AVX512] Autoupgrade the MOVDDUP/MOVSLDUP/MOVSHDUP intrinsics
llvm-svn: 274439
2016-07-02 14:42:35 +00:00
Simon Pilgrim f040d8c061 [X86][AVX512] Add support for lowering shuffles to MOVDDUP/MOVSLDUP/MOVSHDUP
llvm-svn: 274436
2016-07-02 12:45:03 +00:00
Simon Pilgrim 5e95390957 [X86][AVX512] Add test cases that should lower to MOVSLDUP/MOVSHDUP
llvm-svn: 274435
2016-07-02 12:20:35 +00:00
Simon Pilgrim a6f262a1f9 [X86][AVX512] Add fast-isel shuffle tests
Its not worth trying to write out tests for all the avx512f builtins yet, just adding tests for lowering of generic IR as we transition to it (shuffles mainly right now).

llvm-svn: 274434
2016-07-02 12:13:29 +00:00
Matt Arsenault accddacb70 TII: Fix inlineasm size counting comments as insts
The main problem was counting comments on their own
line as instructions.

llvm-svn: 274405
2016-07-01 23:26:50 +00:00
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault 8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault 327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Dehao Chen 7b2c997736 Specify mtriple for the frame-order.ll test.
Summary: original test may have different bahavior on different bot, specifically it broke llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21931

llvm-svn: 274368
2016-07-01 17:35:13 +00:00
Dehao Chen ad2b4e1334 Do not count debug instructions when counting number of uses to reorder frame objects.
Summary: The code generation should be independent of the debug info.

Reviewers: zansari, davidxl, mkuper, majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D21911

llvm-svn: 274357
2016-07-01 15:40:25 +00:00
Nikolay Haustov beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Yunzhong Gao b386955adc Add an artificial line-0 debug location when the compiler emits a call to
__stack_chk_fail(). This avoids a compiler crash.

Differential Revision: http://reviews.llvm.org/D21818

llvm-svn: 274263
2016-06-30 18:49:04 +00:00
Etienne Bergeron 078d8f69b6 revert http://reviews.llvm.org/D21101
llvm-svn: 274251
2016-06-30 17:52:24 +00:00
Etienne Bergeron 47cf4eabe6 [exceptions] Upgrade exception handlers when stack protector is used
Summary:
MSVC provide exception handlers with enhanced information to deal with security buffer feature (/GS).

To be more secure, the security cookies (GS and SEH) are validated when unwinding the stack.

The following code:
```
void f() {}

void foo() {
  __try {
    f();
  } __except(1) {
    f();
  }
}
```

Reviewers: majnemer, rnk

Subscribers: thakis, llvm-commits, chrisha

Differential Revision: http://reviews.llvm.org/D21101

llvm-svn: 274239
2016-06-30 15:36:59 +00:00
Jonas Paulsson 25e193da4c [SystemZ] Let z13 also support FeatureMiscellaneousExtensions.
This processor feature had been left out by mistake from the z13
ProcessorModel.

This time with updated test case. Thanks, Hans.

Reviewed by Ulrich Weigand.

llvm-svn: 274216
2016-06-30 07:13:56 +00:00
Artem Belevich 4d5d7be8cc Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."
The change causes llvm crash in some unoptimized builds.

llvm-svn: 274163
2016-06-29 20:51:15 +00:00
Nico Weber 0a480b2c05 Add a regression test for PR28348.
llvm-svn: 274142
2016-06-29 17:34:31 +00:00
Nico Weber 12fdf60b75 Revert r272251, it caused PR28348.
llvm-svn: 274141
2016-06-29 17:33:41 +00:00
Ahmed Bougacha 15a2f6d58c [X86] Lower blended PACKUSes using appropriate types.
When lowering two blended PACKUS, we used to disregard the types
of the PACKUS inputs, indiscriminately generating a v16i8 PACKUS.

This leads to non-selectable things like:
    (v16i8 (PACKUS (v4i32 v0), (v4i32 v1)))

Instead, check that the PACKUSes have the same type, and use that
as the final result type.

llvm-svn: 274138
2016-06-29 16:56:09 +00:00
Rafael Espindola c4cabb8054 Update tests to use at least darwin9.
llvm-svn: 274129
2016-06-29 14:51:10 +00:00
Simon Pilgrim f9c5908ffd [X86][SSE2] Added _mm_loadu_si64 test to match llvm\tools\clang\test\CodeGen\sse2-builtins.c
llvm-svn: 274127
2016-06-29 14:05:33 +00:00
Simon Pilgrim 851019175b [X86] Regenerated popcnt combine tests
llvm-svn: 274124
2016-06-29 13:54:03 +00:00
Craig Topper 3a011de10c [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second.
llvm-svn: 274097
2016-06-29 03:29:12 +00:00
Craig Topper 1e7e36e7e6 [DAGCombine] Add test cases to show that DAG combining an OR of two shuffles with zero vectors doesn't work if the zero vector is the first operand of the shuffle. Fix coming in a follow up patch.
llvm-svn: 274096
2016-06-29 03:29:09 +00:00
Dehao Chen 8cd84aaa6f Relax the clearance calculating for breaking partial register dependency.
Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency.

Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21560

llvm-svn: 274068
2016-06-28 21:19:34 +00:00
Zhan Jun Liau 347db3e18e [SystemZ] Use NILL instruction instead of NILF where possible
Summary: SystemZ shift instructions only use the last 6 bits of the shift
amount. When the result of an AND operation is used as a shift amount, this
means that we can use the NILL instruction (which operates on the last 16 bits)
rather than NILF (which operates on the last 32 bits) for a 16-bit savings in
instruction size.

Reviewers: uweigand

Subscribers: llvm-commits

Author: colpell
Committing on behalf of Elliot.

Differential Revision: http://reviews.llvm.org/D21686

llvm-svn: 274066
2016-06-28 21:03:19 +00:00
Matthias Braun 0b9a07883d X86FrameLowering: Check subregs when deciding prolog kill flags
llvm-svn: 274057
2016-06-28 20:31:56 +00:00
Artur Pilipenko 7ad95ec22d Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 274043
2016-06-28 18:27:25 +00:00
Michael Kuperstein a118acb82f [X86] Update a test with more explicit checks. NFC.
llvm-svn: 274040
2016-06-28 17:42:13 +00:00
David Majnemer 1c7d532cde [X86] Make WRPKRU/RDPKRU pass -verify-machineinstrs
The original implementation attempted to zero registers using
XOR %foo, %foo.  This is problematic because it constitutes a
read-modify-write of a register which might not be defined.

Instead, use MOV32r0 to avoid these problems; expandPostRAPseudo does
the right thing here.

llvm-svn: 274024
2016-06-28 16:04:46 +00:00
Simon Pilgrim 5f71c909f0 [X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied)
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.

This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.

Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.

As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted

Differential Revision: http://reviews.llvm.org/D21660

llvm-svn: 274013
2016-06-28 13:24:05 +00:00
Simon Pilgrim c15d217831 [X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes
This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments.

Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication.

Differential Revision: http://reviews.llvm.org/D21148

llvm-svn: 273999
2016-06-28 08:08:15 +00:00
Elena Demikhovsky a727f3cfde [X86 Target Lowering] Merged ICMP test.
llvm-svn: 273995
2016-06-28 06:25:38 +00:00
Nick Lewycky 9980075133 NFC. Fix popular typo in comment 'deferencing' --> 'dereferencing'.
Bonus changes, * placement in X86ISelLowering and 'exerce' -> 'exercise' in test.

llvm-svn: 273984
2016-06-28 01:45:05 +00:00
Matt Arsenault b4d9503171 AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977
2016-06-28 01:09:00 +00:00
Matt Arsenault 59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault 03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Justin Holewinski cb29fb4a98 Only emit extension for zeroext/signext arguments if type is < 32 bits
Reviewers: jingyue, jlebar

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D21756

llvm-svn: 273922
2016-06-27 20:22:22 +00:00
Rafael Espindola 8121becac3 Teach shouldAssumeDSOLocal about tls.
Fixes a fixme about handling other visibilities.

llvm-svn: 273921
2016-06-27 20:19:14 +00:00
Matt Arsenault 21a4625a16 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

llvm-svn: 273916
2016-06-27 19:57:44 +00:00
Matt Arsenault f0f721a682 DAGCombiner: Don't narrow volatile vector loads + extract
llvm-svn: 273909
2016-06-27 19:31:04 +00:00
Elena Demikhovsky ad3929cc64 X86 Lowering - Fixed a crash in ICMP scalar instruction
Fixed a bug in EmitTest() function in combining shl + icmp.

https://llvm.org/bugs/show_bug.cgi?id=28119

llvm-svn: 273899
2016-06-27 18:07:16 +00:00
Artur Pilipenko 72f76b8805 Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures.
llvm-svn: 273895
2016-06-27 16:54:33 +00:00
Artur Pilipenko a36aa41519 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 273892
2016-06-27 16:29:26 +00:00
Simon Pilgrim 476e8ceed3 [X86][SSE] Added extra broadcast tests to cover PR28327
llvm-svn: 273891
2016-06-27 16:15:37 +00:00
Zhan Jun Liau 4f130b4410 [SystemZ] Avoid generating 2 XOR instructions for (and (xor x, -1), y)
Summary:
Created a pattern to match 64-bit mode (and (xor x, -1), y)
to a shorter sequence of instructions.

Before the change, the canonical form is translated to:
        xihf    %r3, 4294967295
        xilf    %r3, 4294967295
        ngr     %r2, %r3

After the change, the canonical form is translated to:
        ngr     %r3, %r2
        xgr     %r2, %r3

Reviewers: zhanjunl, uweigand

Subscribers: llvm-commits

Author: assem

Committing on behalf of Assem.

Differential Revision: http://reviews.llvm.org/D21693

llvm-svn: 273887
2016-06-27 15:55:30 +00:00
Krzysztof Parzyszek 5da24e5495 [Hexagon] Equally-sized vectors are equivalent in ISel (except vNi1)
llvm-svn: 273885
2016-06-27 15:08:22 +00:00
Nico Weber 1e058160dd Revert 273848, it caused PR28329
llvm-svn: 273879
2016-06-27 14:36:46 +00:00
Simon Pilgrim 9c2f378587 Removed duplicate assertions note
llvm-svn: 273874
2016-06-27 13:06:18 +00:00
Hrvoje Varga 24b975dc66 [mips][micromips] Implement LD, LLD, LWU, SD, DSRL, DSRL32 and DSRLV instructions
Differential Revision: http://reviews.llvm.org/D16625

llvm-svn: 273850
2016-06-27 08:23:28 +00:00
Simon Pilgrim a45da385f8 [X86][AVX] Peek through bitcasts to find the source of broadcasts
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.

This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.

Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.

Differential Revision: http://reviews.llvm.org/D21660

llvm-svn: 273848
2016-06-27 07:44:32 +00:00
Rafael Espindola 1ac1fa818e Mips: Fix access to private functions.
llvm-svn: 273843
2016-06-27 03:19:40 +00:00
Jan Vesely 3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun 6ad3d05b68 MachineScheduler: Fully compare top/bottom candidates
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.

Differential Revision: http://reviews.llvm.org/D19401

llvm-svn: 273755
2016-06-25 00:23:00 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Krzysztof Parzyszek 709a626015 [Hexagon] Simplify (+fix) instruction selection for indexed loads/stores
llvm-svn: 273733
2016-06-24 21:27:17 +00:00
Rafael Espindola a895a0cd01 Add support for musl-libc on ARM Linux.
Patch by Lei Zhang!

llvm-svn: 273726
2016-06-24 21:14:33 +00:00
Rafael Espindola 88ae09e9be Use shouldAssumeDSOLocal in isOffsetFoldingLegal.
This makes it slightly more powerful for dynamic-no-pic.

llvm-svn: 273704
2016-06-24 18:48:36 +00:00
Kyle Butt 267164df0a Codegen: Fix broken assumption in Tail Merge.
Tail merge was making the assumption that a layout successor or
predecessor was always a cfg successor/predecessor. Remove that
assumption. Changes to tests are necessary because the errant cfg edges
were preventing optimizations.

llvm-svn: 273700
2016-06-24 18:16:36 +00:00
Rafael Espindola 955d3569e7 Use FileCheck. NFC.
llvm-svn: 273699
2016-06-24 18:04:39 +00:00
Chad Rosier fd342808e0 [MachineDominatorTree] Add a MDT verifier.
Differential Revision: http://reviews.llvm.org/D21657

llvm-svn: 273678
2016-06-24 13:32:22 +00:00
Daniel Sanders 0d97270ae5 [mips] Use --check-prefixes where appropriate. NFC.
llvm-svn: 273669
2016-06-24 12:23:17 +00:00
Matt Arsenault 86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault 0534f4aa79 AMDGPU: Un-xfail and add tests
Un XFAIL a few tests plus a few more I had lying around
in my tree, which seem to all work now but I don't see tests
that quite test the same things.

llvm-svn: 273655
2016-06-24 06:58:01 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Ahmed Bougacha f0b46ee0aa [ARM] Use aapcs_vfp for ___truncdfhf2 on v7k.
r215348 overrode the f16 libcalls to be soft-float, but
v7k uses the default (hard-float) calling convention.

llvm-svn: 273631
2016-06-24 00:08:01 +00:00
Kyle Butt 991df7889b Codegen: [X86] preservere memory refs for folded umul_lohi
Memory references were not being propagated for this folded load. This
prevented optimizations like LICM from hoisting the load.

Added test to verify that this allows LICM to proceed.

llvm-svn: 273617
2016-06-23 21:40:35 +00:00
Kyle Butt 178314ab52 Codegen: LICM Remove check for exactly 1 register def.
When considering whether to split an instruction with a memory operand
into an explicit load and a register-based instruction, we currently
check that the resulting instruction has exactly 1 def. This prevents 2
important LICM optimizations: compares with memory operands, and double
indirect calls. All the tests and the test-suite pass without the check.
My guess as to original intent is to limit the additional register pressure
created by the new instruction, but given that we only split out a single
register, it is already limited.

The licm-dominance test now checks actual memory loads for hoisting instead of
undef, and it tests compares.
hoist-invariant-load.ll now checks for 2 hoists, the intended hoist, and a bonus
from calling a got-relative function in a loop.

llvm-svn: 273616
2016-06-23 21:38:49 +00:00
Rafael Espindola 2d3cce71ee Uses shouldAssumeDSOLocal.
With that SystemZ knows to avoid a GOT for PIE.

llvm-svn: 273614
2016-06-23 21:18:59 +00:00
Rafael Espindola f2898d73a5 Convert test to FileCheck.
llvm-svn: 273609
2016-06-23 20:37:49 +00:00
Michael Kuperstein 0194d30e09 [X86] Extract HiPE prologue constants into metadata
X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset
into an Erlang Runtime System-internal data structure (the PCB). As the
layout of this data structure is prone to change, this poses problems
for maintaining compatibility.

To address this problem, the compiler can produce this information as
module-level named metadata. For example (where P_NSP_LIMIT is the
offending offset):

!hipe.literals = !{ !2, !3, !4 }
!2 = !{ !"P_NSP_LIMIT", i32 152 }
!3 = !{ !"X86_LEAF_WORDS", i32 24 }
!4 = !{ !"AMD64_LEAF_WORDS", i32 24 }

Patch by Magnus Lang

Differential Revision: http://reviews.llvm.org/D20363

llvm-svn: 273593
2016-06-23 18:17:25 +00:00
Pablo Barrio 7a64346533 [ARM] Lower (select_cc k k (select_cc ~k ~k x)) into (SSAT l_k x)
Summary:
SSAT saturates an integer, making sure that its value lies within
an interval [-k, k]. Since the constant is given to SSAT as the
number of bytes set to one, k + 1 must be a power of 2, otherwise
the optimization is not possible. Also, the select_cc must use <
and > respectively so that they define an interval.

Reviewers: mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D21372

llvm-svn: 273581
2016-06-23 16:53:49 +00:00
Artur Pilipenko 80771b9ad9 Upgrade other old memset/memcpy signatures in tests causing buildbot failures with rL273568.
llvm-svn: 273580
2016-06-23 16:34:52 +00:00
Artur Pilipenko 4fec7b7131 Fix an old memset signature in 2009-09-01-PostRAProlog.ll test causing a buildbot failure
llvm-svn: 273573
2016-06-23 16:07:10 +00:00
Simon Pilgrim 595dddb103 [X86][AVX512] Added AVX512F vector sign extend tests
Now that Elena has confirmed that PR26474 has been fixed

llvm-svn: 273560
2016-06-23 14:01:45 +00:00
Daniel Sanders de393329b9 [mips] Don't derive the default ABI from the CPU in the backend.
Summary:
The backend has no reason to behave like a driver and should generally do
as it's told (and error out if it can't) instead of trying to figure out
what the API user meant. The default ABI is still derived from the arch
component as a concession to backwards compatibility.

API-users that previously passed an explicit CPU and a triple that was
inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a
different ABI to what they got before. However, it's expected that there
are no such users on the basis that CodeGen has been asserting that the
triple is consistent with the selected ABI for several releases. API-users
that were consistent or passed '' or 'generic' as the CPU will see no
difference.

Reviewers: sdardis, rafael

Subscribers: rafael, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21466

llvm-svn: 273557
2016-06-23 12:42:53 +00:00
Diana Picus e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Craig Topper 597aa42fec [AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects.
llvm-svn: 273543
2016-06-23 07:37:33 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Sanjoy Das e57bf680ec [ImplicitNullChecks] Hoist trivial depdendencies if possible
When trying to convert a loading instruction into a FAULTING_LOAD, we
sometimes face code like this:

  if %R10 is not null:
    %R9<def> = MOV32ri Immediate
    %R9<def, tied> = AND32rm %R9, 0x20(%R10)
  else:
    goto TRAP

In these cases we would like to use the AND32rm instruction as the
faulting operation by hoisting the "depedency" def-ing %R9 also above
the control flow, transforming the program into:

  %R9<def> = MOV32ri Immediate
  %R9<def, tied> = FAULTING_LOAD_OP(AND32rm %R9, 0x20(%R10), FailPath: TRAP)

This change teaches ImplicitNullChecks to do the above, when safe.

llvm-svn: 273501
2016-06-22 22:16:51 +00:00
Rafael Espindola 928a95d0b0 Use shouldAssumeDSOLocal.
With this it handle -fPIE.

llvm-svn: 273499
2016-06-22 22:09:17 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Peter Collingbourne 6d88fde3af IR: Introduce Module::global_objects().
This is a convenience iterator that allows clients to enumerate the
GlobalObjects within a Module.

Also start using it in a few places where it is obviously the right thing
to use.

Differential Revision: http://reviews.llvm.org/D21580

llvm-svn: 273470
2016-06-22 20:29:42 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Krzysztof Parzyszek f7f7068109 [Hexagon] Add SDAG preprocessing step to expose shifted addressing modes
Transform: (store ch addr (add x (add (shl y c) e)))
       to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.

llvm-svn: 273466
2016-06-22 20:08:27 +00:00
Chad Rosier 8c106bcbe8 [AArch64] Remove an overly aggressive assert.
llvm-svn: 273458
2016-06-22 19:18:52 +00:00
Rafael Espindola 8474fdf90d Start using shouldAssumeDSOLocal on Hexagon.
Include a token test showing that access to private is now the same as
to internal.

llvm-svn: 273457
2016-06-22 19:09:14 +00:00
Wei Ding 0526e7f8d9 AMDGPU: Add convergent flag to INLINEASM instruction.
Differential Revision: http://reviews.llvm.org/D21214

llvm-svn: 273455
2016-06-22 18:51:08 +00:00
Zhan Jun Liau 0df350589f [SystemZ] Recognize RISBG opportunities involving a truncate
Summary:
Recognize RISBG opportunities where the end result is narrower than the
original input - where a truncate separates the shift/and operations.

The motivating case is some code in postgres which looks like:

	srlg	%r2, %r0, 11
	nilh	%r2, 255

Reviewers: uweigand

Author: RolandF

Differential Revision: http://reviews.llvm.org/D21452

llvm-svn: 273433
2016-06-22 16:16:27 +00:00
Krzysztof Parzyszek f228c95f87 [Hexagon] Handle expansion of cmpxchg
llvm-svn: 273432
2016-06-22 16:07:10 +00:00
Artur Pilipenko 1cec4fdddf Upgrade old memset/memcpy signatures (without isVolatile argument) in tests
We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect.

llvm-svn: 273428
2016-06-22 15:16:06 +00:00
Simon Pilgrim 1536c19642 Regenerated test
llvm-svn: 273404
2016-06-22 12:58:15 +00:00
Jan Vesely fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Artem Belevich d7ebcfb291 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.

Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.

This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).

This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.

Differential Review: http://reviews.llvm.org/D21421

llvm-svn: 273313
2016-06-21 20:30:26 +00:00
Silviu Baranga 03b6a4fc88 [AArch64] Fix merge-store.ll regression test after r273271
r273271 changed the RUN line of the regression test to use
-march=cyclone instead of -mtriple=aarch64-none-none.

This caused a change in the output syntax for the ext
instruction, causing the test to fail. Change this test
back to using -mtriple=aarch64-none-none.

llvm-svn: 273286
2016-06-21 17:15:49 +00:00
Etienne Bergeron f6be62f2c8 [StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).

This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.

```
    struct _EH4_SCOPETABLE {
        DWORD GSCookieOffset;
        DWORD GSCookieXOROffset;
        DWORD EHCookieOffset;
        DWORD EHCookieXOROffset;
        _EH4_SCOPETABLE_RECORD ScopeRecord[1];
    };

    struct _EH4_SCOPETABLE_RECORD {
        DWORD EnclosingLevel;
        long (*FilterFunc)();
            union {
            void (*HandlerAddress)();
            void (*FinallyFunc)();
        };
    };
```

The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.

```
Lfunc_begin0:
# BB#0:                                 # %entry
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	subl	$28, %esp
	movl	%ebp, %eax                <<-- Loading FramePtr
	movl	%esp, -36(%ebp)
	movl	$-2, -16(%ebp)
	movl	$L__ehtable$use_except_handler4_ssp, %ecx
	xorl	___security_cookie, %ecx
	movl	%ecx, -20(%ebp)
	xorl	___security_cookie, %eax  <<-- XOR FramePtr and Cookie
	movl	%eax, -40(%ebp)           <<-- Storing EHGuard
	leal	-28(%ebp), %eax
	movl	$__except_handler4, -24(%ebp)
	movl	%fs:0, %ecx
	movl	%ecx, -28(%ebp)
	movl	%eax, %fs:0
	movl	$0, -16(%ebp)
	calll	_may_throw_or_crash
LBB1_1:                                 # %cont
	movl	-28(%ebp), %eax
	movl	%eax, %fs:0
	addl	$28, %esp
	popl	%esi
	popl	%edi
	popl	%ebx
	popl	%ebp
	retl

```

And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
	.p2align	2
L__ehtable$use_except_handler4_ssp:
	.long	-2                      # GSCookieOffset
	.long	0                       # GSCookieXOROffset
	.long	-40                     # EHCookieOffset    <<----
	.long	0                       # EHCookieXOROffset
	.long	-2                      # ToState
	.long	_catchall_filt          # FilterFunction
	.long	LBB1_2                  # ExceptionHandler

```

Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.

Reviewers: rnk, majnemer

Subscribers: llvm-commits, chrisha

Differential Revision: http://reviews.llvm.org/D21231

llvm-svn: 273281
2016-06-21 15:58:55 +00:00
Evandro Menezes 230083ff9d [AArch64] Change the preferred alignment for char and short to word alignment
Differential Revision: http://reviews.llvm.org/D21414

llvm-svn: 273279
2016-06-21 15:55:18 +00:00
Silviu Baranga dc43d61a25 [AArch64] Switch regression tests to test features not CPUs
Summary:
We have switched to using features for all heuristics, but
the tests for these are still using -mcpu, which means we
are not directly testing the features.

This converts at least some of the existing regression tests
to use the new features.

This still leaves the following features untested:

merge-narrow-ld
predictable-select-expensive
alternate-sextload-cvt-f32-pattern
disable-latency-sched-heuristic

Reviewers: mcrosier, t.p.northover, rengolin

Subscribers: MatzeB, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D21288

llvm-svn: 273271
2016-06-21 15:16:34 +00:00
Daniel Sanders bf2c03ee69 [arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.

Reviewers: t.p.northover

Subscribers: t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D21431

llvm-svn: 273259
2016-06-21 12:29:03 +00:00
Craig Topper 283418fbb6 [AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 273253
2016-06-21 07:37:32 +00:00
Craig Topper 9038aa3001 [AVX512] Use update_llc_test_checks.py to regenerate a test in preparation for a future commit.
llvm-svn: 273252
2016-06-21 07:37:27 +00:00
James Y Knight 03c1415b8f Revert "Change RelaxELFRelocations for llc."
This reverts commit r273019.

From email I sent to list:
> I don't think this makes sense. Either the linker you're using supports
> this feature, or it doesn't. Having it enabled for llc if your linker
> doesn't support it is not fun.
>
> Further note that this also affects basically all other code using llvm
> libraries -- other than Clang, which explicitly sets it back to false by
> default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to
> true.
>
> If you want to enable the relax mode across all llvm tools in some
> circumstances, I think it should be via moving the cmake flag from clang
> down into llvm.
>
> I'm going to revert this commit, since I both think it intrinsically
> doesn't make sense to do this, and because it's breaking some of our
> tools.

llvm-svn: 273245
2016-06-21 05:40:41 +00:00
Craig Topper 0a0fb0fda1 [AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps.
llvm-svn: 273240
2016-06-21 03:53:24 +00:00
Simon Pilgrim 225b2e37a0 [X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets
Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128.

Added 32-bit tests for fp128 cast/conversions.

llvm-svn: 273210
2016-06-20 22:41:17 +00:00
Matt Arsenault 2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00
Matt Arsenault ff98241f37 Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

llvm-svn: 273177
2016-06-20 18:13:04 +00:00
Matt Arsenault a9720c67f1 AMDGPU: Use correct method for determining instruction size
llvm-svn: 273172
2016-06-20 17:51:32 +00:00
Rafael Espindola 959e9c8d01 Use shouldAssumeDSOLocal.
With this ARM fast isel knows that PIE variable are not preemptable.

llvm-svn: 273169
2016-06-20 17:45:33 +00:00
Tom Stellard 5350894265 AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

llvm-svn: 273168
2016-06-20 17:33:43 +00:00
Tom Stellard 1c89eb7db0 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

llvm-svn: 273166
2016-06-20 16:59:44 +00:00
Sam Parker d616cf07b2 [ARM] Enable isel of UMAAL
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.

Differential Revision: http://http://reviews.llvm.org/D21461

llvm-svn: 273165
2016-06-20 16:47:09 +00:00
Simon Pilgrim 0a81b13f31 [X86][F16C] Added half <-> double conversion tests
llvm-svn: 273153
2016-06-20 12:51:55 +00:00
Pankaj Gode 0aab2e398a [AARCH64] Add support for Broadcom Vulcan
Adding core tuning support for new Broadcom Vulcan core (ARMv8.1A).

Differential Revision: http://reviews.llvm.org/D21500

llvm-svn: 273148
2016-06-20 11:13:31 +00:00
Igor Breger e59165ca63 [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Differential Revision: http://reviews.llvm.org/D20897

llvm-svn: 273138
2016-06-20 07:05:43 +00:00
Simon Pilgrim 0887d5b02e [X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support
llvm-svn: 273125
2016-06-19 20:59:19 +00:00
Simon Pilgrim 3d881a0230 [X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values
We currently only allow exact matches of shuffle mask patterns during target shuffle combining.

This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value.

I've adjusted some tests that were requiring exact shuffle masks to now include undef values.

Differential Revision: http://reviews.llvm.org/D21495

llvm-svn: 273119
2016-06-19 18:03:52 +00:00
Chris Dewhurst a294541c05 [SPARC[ Correcting out-of-date unit tests checked in as part of r273108
llvm-svn: 273110
2016-06-19 12:52:39 +00:00
Chris Dewhurst 0c1e0026aa [SPARC] Fixes for hardware errata on LEON processor.
Passes to fix three hardware errata that appear on some LEON processor variants.

The instructions FSMULD, FMULS and FDIVS do not work as expected on some LEON processors. This change allows those instructions to be substituted for alternatives instruction sequences that are known to work.

These passes only run when selected individually, or as part of a processor defintion. They are not included in general SPARC processor compilations for non-LEON processors or for those LEON processors that do not have these hardware errata.

llvm-svn: 273108
2016-06-19 11:03:28 +00:00
Simon Pilgrim 9a09652a3a [X86][AVX] Added test case for PR28136
llvm-svn: 273098
2016-06-18 22:59:08 +00:00
Simon Pilgrim cd6d4352bc [X86][SSSE3] Added examples of target shuffle combining failing to match undefs in shuffle masks
llvm-svn: 273097
2016-06-18 21:18:21 +00:00
Simon Pilgrim ab009e9f41 [X86][XOP] Added fast-isel tests matching tools/clang/test/CodeGen/xop-builtins.c
llvm-svn: 273096
2016-06-18 21:07:31 +00:00
Simon Pilgrim b201678763 [X86][TBM] Added fast-isel tests matching tools/clang/test/CodeGen/tbm-builtins.c
llvm-svn: 273087
2016-06-18 17:20:52 +00:00
Vasileios Kalintiris 0cf68df6cc [mips] Emit a JALR with $rd equal to $zero, instead of a JR in MIPS32R6.
Summary:
JR is an alias of JALR with $rd=0 in the R6 ISA. Also, this fixes recursive
builds in MIPS32R6.

Reviewers: dsanders, sdardis

Subscribers: jfb, dschuff, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21370

llvm-svn: 273085
2016-06-18 15:39:43 +00:00
Matt Arsenault e935f05a94 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

llvm-svn: 273080
2016-06-18 05:15:53 +00:00
Simon Pilgrim f4b2af1b9f [X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsics
Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern.

llvm-svn: 273077
2016-06-18 02:38:26 +00:00
Matt Arsenault 0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Matt Arsenault 8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Marcin Koscielnicki fd4b6b9e51 [SelectionDAG] Don't treat library calls specially if marked with nobuiltin.
To be used by D19781.

Differential Revision: http://reviews.llvm.org/D19801

llvm-svn: 273039
2016-06-17 20:24:07 +00:00
Michael Kuperstein 18d6d3d95e [X86] Add missing AVX512 anyext patterns.
Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and
i32 patterns.

llvm-svn: 273038
2016-06-17 20:21:17 +00:00
Tim Northover 28a9e7f4ba ARM: take account of possible bundle when erasing an instruction.
Fortunately this appears to be the only ARM-specific pass that runs while
bundles might be in play, so no other cases need modifying.

llvm-svn: 273029
2016-06-17 18:40:46 +00:00
James Y Knight 148a6469dc Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass.
Many CPUs only have the ability to do a 4-byte cmpxchg (or ll/sc), not 1
or 2-byte. For those, you need to mask and shift the 1 or 2 byte values
appropriately to use the 4-byte instruction.

This change adds support for cmpxchg-based instruction sets (only SPARC,
in LLVM). The support can be extended for LL/SC-based PPC and MIPS in
the future, supplanting the ISel expansions those architectures
currently use.

Tests added for the IR transform and SPARCv9.

Differential Revision: http://reviews.llvm.org/D21029

llvm-svn: 273025
2016-06-17 18:11:48 +00:00
Rafael Espindola 9f86baebe0 Change RelaxELFRelocations for llc.
As a developer tool it makes sense for it to use the new relocations.

llvm-svn: 273019
2016-06-17 17:43:41 +00:00
Simon Pilgrim 6a35e5ab97 [X86][SSE4A] Remove the GCCBuiltins from the movntsd/movntss intrinsic defs so we can emit native IR from clang.
Clang-side sibling commit to follow.

llvm-svn: 273002
2016-06-17 14:27:38 +00:00
Ranjeet Singh 39d2d097d6 [ARM] Add support for mrrc/mrrc2 intrinsics.
Reapplying patch as it was reverted when it was first
committed because of an assertion failure when the
mrrc2 intrinsic was called in ARM mode. The failure
was happening because the instruction was being built
in ARMISelDAGToDAG.cpp and the tablegen description for
mrrc2 instruction doesn't allow you to use a predicate.

The ARM architecture manuals do say that mrrc2 in ARM
mode can be predicated with AL in assembly but this has
no effect on the encoding of the instruction as the top
4 bits will always be 1111 not 1110 which is the encoding
for the condition AL.

Differential Revision: http://reviews.llvm.org/D21408

llvm-svn: 272982
2016-06-17 00:52:41 +00:00
Sanjay Patel 0e9afea3c8 [x86] autoupgrade and remove AVX2 integer min/max intrinsics
This will (hopefully very temporarily) break clang.
The clang side of this should be the next commit.

llvm-svn: 272932
2016-06-16 18:44:20 +00:00
Rafael Espindola 5a07687a8e dos2unix this test. NFC.
llvm-svn: 272928
2016-06-16 18:21:11 +00:00
Sanjay Patel d09a21682f remove old FileCheck lines that are no longer used
llvm-svn: 272921
2016-06-16 17:04:16 +00:00
Sanjay Patel f664f3a578 [DAG] Remove redundant FMUL in Newton-Raphson SQRT code
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.

Patch by Nikolai Bozhenov!

Differential Revision: http://reviews.llvm.org/D21127

llvm-svn: 272920
2016-06-16 16:58:54 +00:00
Rafael Espindola afade35003 Don't print (PLT) on arm.
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.

This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.

llvm-svn: 272909
2016-06-16 16:09:53 +00:00
Sanjay Patel 51ab757941 [x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics
Follow-up to:
http://reviews.llvm.org/rL272806
http://reviews.llvm.org/rL272807

llvm-svn: 272907
2016-06-16 15:48:30 +00:00
Daniel Sanders de7816b0cd [mips][mips16] Fix machine verifier errors about incorrect register classes on load/stores.
Summary:
[ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore
lower frameindex nodes such that there is a copy to a CPU16Regs register. This
is now done consistently using a separate addressing mode that does not
permit frameindex nodes.

As part of this I've had to remove an optimization that reduced the number of
instructions needed to work around the lack of sp-relative addresses on [ls][bh]
and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as
a copy of the stack pointer and it's implementation was the root cause of many
of the register vs register class mismatches.

lw/sw can use sp-relative addresses but we ought to ensure that we use the
correct version of lw/sw internally for things like IAS. This is not currently
the case and this change does not fix this. However, this change does clean it
up sufficiently well to fix the machine verifier failures.

Also removed irrelevant functions from stchar.ll.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21062

llvm-svn: 272882
2016-06-16 10:20:59 +00:00
Daniel Sanders 1d14864bb3 [llvm-objdump] Support detection of feature bits from the object and implement this for Mips.
Summary:
The Mips implementation only covers the feature bits described by the ELF
e_flags so far. Mips stores additional feature bits such as MSA in the
.MIPS.abiflags section.

Also fixed a small bug this revealed where microMIPS wouldn't add the
EF_MIPS_MICROMIPS flag when using -filetype=obj.

Reviewers: echristo, rafael

Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21125

llvm-svn: 272880
2016-06-16 09:17:03 +00:00
Hrvoje Varga f1e0a03d08 [mips][micromips] Implement DCLO, DCLZ, DROTR, DROTR32 and DROTRV instructions
Differential Revision: http://reviews.llvm.org/D16917

llvm-svn: 272876
2016-06-16 07:06:25 +00:00
Tim Northover daa1c018b0 AArch64: allow MOV (imm) alias to be printed
The backend has been around for years, it's pretty ridiculous that we can't
even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen
can't handle the complex predicates when printing so it's a bunch of nasty C++.
Oh well.

llvm-svn: 272865
2016-06-16 01:42:25 +00:00
Matt Arsenault 191763026c AMDGPU: Disable scheduling in some slow tests
Disabling the pre-RA scheduler on large-work-group-registers
causes it to be ~50% slower.

llvm-svn: 272860
2016-06-16 00:56:47 +00:00
Sanjay Patel 74b40bdb53 [x86, SSE] update packed FP compare tests for direct translation from builtin to IR
The clang side of this was r272840:
http://reviews.llvm.org/rL272840

A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely.

Differential Revision: http://reviews.llvm.org/D21269

llvm-svn: 272841
2016-06-15 21:22:15 +00:00
Sanjay Patel 0b526676ab [x86] delete unnecessary function declarations
Missed this in r272806, r272807.

llvm-svn: 272834
2016-06-15 20:51:47 +00:00
Tim Northover 389a1e39ea AArch64: stop trying to use 32-bit MOVZs when expanding patchpoints.
Of course the assembly was right but because the opcode was MOVZWi it was
encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and
would go horribly wrong on a CPU.

No idea how this bug survived this long. It seems nobody is using that aspect
of patchpoints.

llvm-svn: 272831
2016-06-15 20:33:36 +00:00
Sanjay Patel 1a4569df54 [x86] add folds for x86 vector compare nodes (PR27924)
Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened 
with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend
too while we're working on that and as a backstop.

This fixes:
https://llvm.org/bugs/show_bug.cgi?id=27924

Differential Revision: http://reviews.llvm.org/D21356

llvm-svn: 272828
2016-06-15 20:26:58 +00:00
Kevin B. Smith acbda9ef30 [X86]: Updated r272801 to promote 16 bit compares with immediate operand
to 32 bits. This is in response to a comment by Eli Friedman.

llvm-svn: 272814
2016-06-15 18:18:05 +00:00
Sanjay Patel a6c6f09967 [x86, SSE] remove the GCCBuiltins from the integer min/max intrinsics
This allows us to emit native IR in Clang (next commit).
Also, update the intrinsic tests to show that codegen already knows how to handle
the IR that Clang will soon produce.

llvm-svn: 272806
2016-06-15 17:17:27 +00:00
Kevin B. Smith 54566a0e9a [X86]: Quit promoting 8 and 16 bit compares to 32 bit.
Differential Revision: http://reviews.llvm.org/D21144

llvm-svn: 272801
2016-06-15 16:37:46 +00:00
Kevin B. Smith c3c82cdbd0 [X86]: Improve Liveness checking for X86FixupBWInsts.cpp
Differential Revision: http://reviews.llvm.org/D21085

llvm-svn: 272797
2016-06-15 16:03:06 +00:00
Ranjeet Singh 0db7be886e Reverting r272778 because there's an assertion
failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll

llvm-svn: 272791
2016-06-15 14:23:29 +00:00
Simon Dardis 7bdf183ac1 [mips] Missing test case
Add missing testcase from r272666.

llvm-svn: 272784
2016-06-15 13:49:58 +00:00
Ranjeet Singh 351364fe76 [ARM] Add support for mrrc/mrrc2 intrinsics.
Differential Revision: http://reviews.llvm.org/D21178

llvm-svn: 272778
2016-06-15 11:32:24 +00:00
Daniel Sanders df3185d2ea [mips] Removed invalid test from o32_cc.ll
MIPS32R1 cannot implement a 64-bit FPU because this was introduced in MIPS32R2.

llvm-svn: 272769
2016-06-15 09:47:27 +00:00
Daniel Sanders d3bb20821d [mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX().
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21068

llvm-svn: 272765
2016-06-15 08:43:23 +00:00
Zlatko Buljan d2ed9c6c2c [mips][microMIPS] Add CodeGen support for AND*, OR16, OR*, XOR*, NOT16 and NOR instructions
Differential Revision: http://reviews.llvm.org/D16719

llvm-svn: 272764
2016-06-15 07:46:24 +00:00
Igor Breger 64cfd3a442 [AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior.
Use BLENDM instead of masked move instruction.

Differential Revision: http://reviews.llvm.org/D21001

llvm-svn: 272763
2016-06-15 07:30:38 +00:00
Nicolai Haehnle a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Sanjoy Das 0272be206a Don't force SP-relative addressing for statepoints
Summary:
...  when the offset is not statically known.

Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.

Patch by Oscar Blumberg!

Reviewers: sanjoy

Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm

Differential Revision: http://reviews.llvm.org/D21259

llvm-svn: 272756
2016-06-15 05:35:14 +00:00
David Majnemer cbf614a93b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Xinliang David Li 8052238ac0 Fix a test case to match its intention
llvm-svn: 272733
2016-06-14 23:05:46 +00:00
Dehao Chen 9f2bdfb40f Set machine block placement hot prob threshold for both static and runtime profile.
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20991

llvm-svn: 272729
2016-06-14 22:27:17 +00:00
Sanjay Patel 4c3cb8b6c0 [x86] add current codegen tests for PR27924
llvm-svn: 272714
2016-06-14 21:25:46 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Wei Mi b799a625f9 [X86] Reduce the width of multiplification when its operands are extended from i8 or i16
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.

Differential Revision: http://reviews.llvm.org/D20931

llvm-svn: 272694
2016-06-14 18:53:20 +00:00
Nirav Dave f8d00d5cac Fix BSS global handling in AsmPrinter
Change EmitGlobalVariable to check final assembler section is in BSS
before using .lcomm/.comm directive. This prevents globals from being
put into .bss erroneously when -data-sections is used.

This fixes PR26570.

Reviewers: echristo, rafael

Subscribers: llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21146

llvm-svn: 272674
2016-06-14 15:09:30 +00:00
Simon Dardis 878c0b1b76 [mips] Optimize stack pointer adjustments.
Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.

This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.

This change may break some naive stack unwinders.

Partially resolves PR/26291.

Thanks to David Chisnall for reporting the issue.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D21321

llvm-svn: 272666
2016-06-14 13:39:43 +00:00
James Molloy 65b6be1d3a [Thumb] Fix off-by-one error in r272007
We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256.

Found by Oliver Stannard and csmith!

llvm-svn: 272665
2016-06-14 13:33:07 +00:00
Simon Dardis 4fbf76f7c3 [mips][atomics] Fix atomic instruction descriptions and uses.
PR27458 highlights that the MIPS backend does not have well formed
MIR for atomic operations (among other errors).

This patch adds expands and corrects the LL/SC descriptions and uses
for MIPS(64).

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D19719

llvm-svn: 272655
2016-06-14 11:29:28 +00:00
Simon Pilgrim cf1165b86e [X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS
llvm-svn: 272651
2016-06-14 09:43:38 +00:00
Simon Dardis e661e528db [mips] MIPS32/64 itineraries
Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded
instructions are marked as having no scheduling info.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D20418

llvm-svn: 272648
2016-06-14 09:35:29 +00:00
Daniel Sanders 435a653437 [mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic.
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21063

llvm-svn: 272647
2016-06-14 09:29:46 +00:00
Daniel Sanders d2a49ec3ab [mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA.
Summary:
The machine verifier reports 'Explicit operand marked as def' when it is
manually specified even though it agrees with the operand info.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21065

llvm-svn: 272646
2016-06-14 09:11:33 +00:00
Diana Picus bae1d89e45 [SelectionDAG] Remove exit-on-error flag from test (PR27765)
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.

We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.

Fixes PR27765.

Differential Revision: http://reviews.llvm.org/D21061

llvm-svn: 272644
2016-06-14 07:30:20 +00:00
Igor Breger 484bace21b re-generate the tests using the update_llc_test_checks.py script
llvm-svn: 272643
2016-06-14 07:05:10 +00:00
Craig Topper 99e30e6a66 [AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR.
llvm-svn: 272626
2016-06-14 03:13:00 +00:00
Craig Topper ddab395397 [AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 272625
2016-06-14 03:12:54 +00:00
Craig Topper cbe54a4bd9 [AVX512] Add tests for zero extending masks that show an unnecessary movzx instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious.
llvm-svn: 272624
2016-06-14 03:12:48 +00:00
Sanjoy Das 98ac278b86 Move previously added test case to the right location
In rL272580 I accidentally added a test case to test/CodeGen when
test/Transforms/DeadStoreElimination/ is a better place for it.

llvm-svn: 272581
2016-06-13 20:12:07 +00:00
Sanjoy Das d0bdf3e02b Fix AAResults::callCapturesBefore for operand bundles
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.

Patch by Oscar Blumberg!

Reviewers: sanjoy

Differential Revision: http://reviews.llvm.org/D21286

llvm-svn: 272580
2016-06-13 19:55:04 +00:00
Simon Pilgrim 582b9ce36e [X86][SSE] Added extract to scalar nontemporal store tests
llvm-svn: 272577
2016-06-13 19:08:28 +00:00
David Majnemer 248190ba69 [X86] Remove llvm.x86.bit.scan.{forward,reverse}.32
The need for these intrinsics has been obviated by r272564 which
reimplements their functionality using generic IR.

llvm-svn: 272566
2016-06-13 17:33:13 +00:00
Marek Olsak e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Ulrich Weigand daae87aa21 [SystemZ] Enable index register memory constraints for inline ASM
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.

As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).

Author: colpell
Differential Revision: http://reviews.llvm.org/D21239

llvm-svn: 272547
2016-06-13 14:24:05 +00:00
Ranjeet Singh 933e1aa39f [ARM] Reverting r272544 because clang patch needs
to go in as soon as llvm patch has gone in because
tests will start breaking in Clang.

llvm-svn: 272546
2016-06-13 10:58:24 +00:00
Ranjeet Singh 8feacb330d [ARM] Add mrrc/mrrc2 co-processor intrinsics
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.

Differential Revision: 

llvm-svn: 272544
2016-06-13 10:43:50 +00:00
Strahinja Petrovic f0980e4dc0 This patch fixes handling long double type when it is
constant in soft float mode on PowerPC 32 architecture.

llvm-svn: 272543
2016-06-13 10:29:29 +00:00
Simon Pilgrim 377bc2ea43 [X86][SSE4A] Renamed tests to correspond with the the instruction with being tested
llvm-svn: 272542
2016-06-13 10:14:42 +00:00
Craig Topper 13cf7cac07 [AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector.
llvm-svn: 272527
2016-06-13 02:36:48 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Craig Topper 1067986c5b [X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector.
llvm-svn: 272510
2016-06-12 14:11:32 +00:00
Simon Pilgrim 9d8bed1796 [X86][BMI] Added fast-isel tests for BMI1 intrinsics
A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops 

llvm-svn: 272508
2016-06-12 09:56:05 +00:00
Craig Topper b7713e413b [X86] Move tests for llvm.x86.avx.vpermil.* intrinsics to a -upgrade test since they are autoupgraded to shufflevector.
llvm-svn: 272494
2016-06-12 01:41:06 +00:00
Simon Pilgrim 2b7c02a04f [X86] Updated test checks script to generalise LCPI symbol refs
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file

Refreshed some tests to demonstrate the new check

llvm-svn: 272488
2016-06-11 20:39:21 +00:00
Simon Pilgrim 5b9bade8dd [X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.

llvm-svn: 272477
2016-06-11 15:44:13 +00:00
Craig Topper 46f49fb407 [AVX512] Re-generate v8i64 shuffle test now that we use pshufd for some cases.
llvm-svn: 272474
2016-06-11 13:57:08 +00:00
Craig Topper 504fba5c8a [AVX512] Lower v8i64 and v16i32 to pshufd when possible.
llvm-svn: 272473
2016-06-11 13:43:21 +00:00
Simon Pilgrim 6800a45790 [X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining

llvm-svn: 272471
2016-06-11 13:38:28 +00:00
Simon Pilgrim 8dd73e3ffa [X86][AVX2] Added PSLLDQ/PSRLDQ shuffle combining tests
llvm-svn: 272469
2016-06-11 13:18:21 +00:00
Craig Topper 40abd1cc61 [AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW.
llvm-svn: 272450
2016-06-11 03:27:42 +00:00
Quentin Colombet f2a1909bb5 [IRTranslator] Support the translation of or.
Now or instructions get translated into G_OR.

llvm-svn: 272433
2016-06-10 20:50:35 +00:00
Sanjay Patel b114fd65fc [x86] enable bitcasted fabs/fneg transforms
The vector cases don't change because we already have folds in X86ISelLowering
to look through and remove bitcasts.

llvm-svn: 272427
2016-06-10 20:33:50 +00:00
Zhan Jun Liau ab42cbce98 [SystemZ] Support Compare and Traps
Support and generate Compare and Traps like CRT, CIT, etc.

Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.

Differential Revision: http://reviews.llvm.org/D21155

llvm-svn: 272419
2016-06-10 19:58:10 +00:00
Tom Stellard f3af841462 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Mehdi Amini cbd68ecf04 Move CodeGen test from Generic to X86 specific directory
llvm-svn: 272416
2016-06-10 19:14:01 +00:00
Mehdi Amini 1d396832d3 Interprocedural Register Allocation (IPRA): add a Transformation Pass
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: http://reviews.llvm.org/D21180

llvm-svn: 272414
2016-06-10 18:37:21 +00:00
Sanjay Patel d558bdadd2 [x86] add test for PR28044
llvm-svn: 272411
2016-06-10 18:05:55 +00:00
Mehdi Amini bbacddfe92 Interprocedural Register Allocation (IPRA) Analysis
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.

When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.

The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.

An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: http://reviews.llvm.org/D20769

llvm-svn: 272403
2016-06-10 16:19:46 +00:00
Sanjay Patel 27f06ae7a5 [x86] fix test attributes and autogenerate checks
llvm-svn: 272398
2016-06-10 15:30:52 +00:00
Sanjay Patel cccccd9ab5 [x86] add missing tests for fcmp ueq/one
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).

There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.

llvm-svn: 272397
2016-06-10 15:17:54 +00:00
Sanjay Patel 330a359fb3 [x86] regenerate checks
llvm-svn: 272396
2016-06-10 14:48:50 +00:00
Simon Pilgrim 2fa2690bca [X86][SSE] Added target shuffle combine tests for byte shift/rotates (PSLLDQ/PSRLDQ/PALIGNR)
llvm-svn: 272392
2016-06-10 13:03:22 +00:00
Simon Pilgrim 34263ad995 [X86][AVX512] Added VPSLLDQ/VPSRLDQ memory fold tests
Memory operand is new for AVX512 (SSE/AVX2 didn't support it).

Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations).

Regenerated VPALIGNR test now that the shuffle comments work

llvm-svn: 272383
2016-06-10 09:56:20 +00:00
Craig Topper 200d237e57 [AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ.
llvm-svn: 272371
2016-06-10 05:12:40 +00:00
Craig Topper 89c1761474 [AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names.
llvm-svn: 272367
2016-06-10 04:48:05 +00:00
Quentin Colombet 3198649199 [LiveRangeEdit] Add a test case for r272314.
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)

Related to llvm.org/PR27983

llvm-svn: 272360
2016-06-10 01:57:48 +00:00
Quentin Colombet 129458a7ed [llc] Add support for several run-pass options.
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.

Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!

llvm-svn: 272356
2016-06-10 00:52:10 +00:00
Matt Arsenault 58ddad5bd6 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard 26a2ab7477 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Matt Arsenault 7757c59e48 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault 887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Matt Arsenault 25363d37fc AMDGPU: Fix missing and broken check lines in atomic tests
llvm-svn: 272343
2016-06-09 23:42:44 +00:00
Eric Christopher 1dbb23e162 Add aliases for mfvrsave/mtvrsave.
Update a test as we're now going to emit it for easier reading of
generated assembly as well.

llvm-svn: 272339
2016-06-09 23:27:48 +00:00
Simon Pilgrim 643734c565 [X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments
llvm-svn: 272319
2016-06-09 22:03:15 +00:00
Simon Pilgrim f718682eb9 [X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ 

llvm-svn: 272308
2016-06-09 21:09:03 +00:00
Simon Pilgrim 47c76e201a [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR
llvm-svn: 272307
2016-06-09 20:53:12 +00:00
Simon Pilgrim 0ab9d3026a [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget

llvm-svn: 272300
2016-06-09 20:13:58 +00:00
Justin Lebar ed2c282d4b [NVPTX] Add intrinsics for shfl instructions.
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers.  Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D21160

llvm-svn: 272298
2016-06-09 20:04:08 +00:00
Wei Ding ed0f97fad2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Davide Italiano 1a7e32cc48 Also fix a typo. Need more coffee today.
llvm-svn: 272278
2016-06-09 17:06:01 +00:00
Davide Italiano f326b30a15 Improve r272262, check that __stack_chk_guard is used.
Thanks to Rafael for the suggestion.

llvm-svn: 272277
2016-06-09 17:04:38 +00:00
Jan Vesely 2da0cba5fb SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization
Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG.

Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D17898

llvm-svn: 272272
2016-06-09 16:04:00 +00:00
Haicheng Wu 5b458cc1f6 Reapply "[MBP] Reduce code size by running tail merging in MBP.""
This reapplies commit r271930, r271915, r271923.  They hit a bug in
Thumb which is fixed in r272258 now.

The original message:

The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.

This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.

llvm-svn: 272267
2016-06-09 15:24:29 +00:00
Ulrich Weigand 79564611d9 [SystemZ] Enable long displacement constraints for inline ASM operands
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.

I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.

Author: colpell
Differential Revision: http://reviews.llvm.org/D21097

llvm-svn: 272266
2016-06-09 15:19:16 +00:00
Davide Italiano 24f1f62dca Move stackguard test to X86/ directory as it's not generic.
llvm-svn: 272264
2016-06-09 15:16:58 +00:00
Davide Italiano bd4243c519 [CodeGen] Change getSDagStackGuard to get an internal sym.
Fixes a crash in the backend during an LTO build of rtld(1) in
FreeBSD.

llvm-svn: 272262
2016-06-09 14:23:38 +00:00
Igor Breger f635367e2b [AVX512] Remove masked_move/blendm intrinsic from back-end.
This is complement patch to D21060.

Differential Revision: http://reviews.llvm.org/D21174

llvm-svn: 272257
2016-06-09 11:46:55 +00:00
Zlatko Buljan cd242c1655 [mips][microMIPS] Add CodeGen support for SEL.*, SELEQZ, SELNEZ, SELEQZ.*, SELNEZ.* and CMP.condn.fmt instructions
Differential Revision: http://reviews.llvm.org/D20862

llvm-svn: 272256
2016-06-09 11:15:53 +00:00
Diana Picus db2aff0ab4 [llc] Remove exit-on-error flag from MIR tests (PR27770)
This is made possible by removing an assert in llc that assumed
MIRParser::parseLLVMModule would exit on error. MIRParser's documentation states
that it returns null if a parsing error occurs, so there's no reason to assert.
We can instead just fall through to where the check for a module is performed
and exit if it is null.

This commit is part of the clean-up after r269655.

Fixes PR27770

Differential Revision: http://reviews.llvm.org/D20371

llvm-svn: 272254
2016-06-09 10:31:05 +00:00
Craig Topper 6f7288dc44 [AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix.
llvm-svn: 272252
2016-06-09 07:49:08 +00:00
James Molloy feb9f4243b [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

  int i(int a) {
    return a & 0xfffffeec;
  }

Used to produce:
    ldr r1, [CONSTPOOL]
    ands r0, r1
  CONSTPOOL: 0xfffffeec

And now produces:
    movs    r1, #255
    adds    r1, #20  ; Less costly immediate generation
    bics    r0, r1

llvm-svn: 272251
2016-06-09 07:39:08 +00:00
Craig Topper 8537c11ff3 [X86] Fix a test I failed to re-generate in r272249.
llvm-svn: 272250
2016-06-09 07:10:34 +00:00
Craig Topper 7a2993093e [X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions.
llvm-svn: 272249
2016-06-09 07:06:38 +00:00
Quentin Colombet 2c6469687d [MIR] Check that generic virtual registers get a size.
Without that check it was possible to write test cases where the size
was not specified and we ended up with weird asserts down the road,
because the default value (1) would not make sense.

llvm-svn: 272226
2016-06-08 23:27:46 +00:00
Dehao Chen 769219b11a Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
Summary:
Consider the following diamond CFG:

 A
/ \
B C
 \/
 D

Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20989

llvm-svn: 272203
2016-06-08 21:30:12 +00:00
Quentin Colombet d1cd30b218 [AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR.
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.

Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.

llvm-svn: 272170
2016-06-08 16:53:32 +00:00
Oliver Stannard b3378e2f3c [ARM] MSR instructions implicitly set CPSR
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.

The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.

Differential Revision: http://reviews.llvm.org/D21139

llvm-svn: 272154
2016-06-08 15:26:34 +00:00
Matthias Braun 3ef7df9cdf MIR: Fix parsing of stack object references in MachineMemOperands
The MachineMemOperand parser lacked the code to handle %stack.X
references (%fixed-stack.X was working).

llvm-svn: 272082
2016-06-08 00:47:07 +00:00
Nicolai Haehnle c00e03b8f5 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

llvm-svn: 272063
2016-06-07 21:37:17 +00:00
Simon Pilgrim 536434e80f [X86][SSE4A] Regenerated SSE4A intrinsics tests
There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output

llvm-svn: 272060
2016-06-07 21:15:45 +00:00
Eric Christopher 538d09d0dd Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00
Etienne Bergeron 22bfa83208 [stack-protection] Add support for MSVC buffer security check
Summary:
This patch is adding support for the MSVC buffer security check implementation

The buffer security check is turned on with the '/GS' compiler switch.
  * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
  * To be added to clang here: http://reviews.llvm.org/D20347

Some overview of buffer security check feature and implementation:
  * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
  * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
  * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html


For the following example:
```
int example(int offset, int index) {
  char buffer[10];
  memset(buffer, 0xCC, index);
  return buffer[index];
}
```

The MSVC compiler is adding these instructions to perform stack integrity check:
```
        push        ebp  
        mov         ebp,esp  
        sub         esp,50h  
  [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
  [2]   xor         eax,ebp  
  [3]   mov         dword ptr [ebp-4],eax  
        push        ebx  
        push        esi  
        push        edi  
        mov         eax,dword ptr [index]  
        push        eax  
        push        0CCh  
        lea         ecx,[buffer]  
        push        ecx  
        call        _memset (010610B9h)  
        add         esp,0Ch  
        mov         eax,dword ptr [index]  
        movsx       eax,byte ptr buffer[eax]  
        pop         edi  
        pop         esi  
        pop         ebx  
  [4]   mov         ecx,dword ptr [ebp-4]  
  [5]   xor         ecx,ebp  
  [6]   call        @__security_check_cookie@4 (01061276h)  
        mov         esp,ebp  
        pop         ebp  
        ret  
```

The instrumentation above is:
  * [1] is loading the global security canary,
  * [3] is storing the local computed ([2]) canary to the guard slot,
  * [4] is loading the guard slot and ([5]) re-compute the global canary,
  * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.

Overview of the current stack-protection implementation:
  * lib/CodeGen/StackProtector.cpp
    * There is a default stack-protection implementation applied on intermediate representation.
    * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
    * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
    * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
    * Guard manipulation and comparison are added directly to the intermediate representation.

  * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
      * see long comment above 'class StackProtectorDescriptor' declaration.
    * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
      * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
    * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.

  * include/llvm/Target/TargetLowering.h
    * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.

  * lib/Target/X86/X86ISelLowering.cpp
    * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.

Function-based Instrumentation:
  * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
  * To support function-based instrumentation, this patch is
    * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
      * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
    * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
    * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
    * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).

Modifications
  * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
  * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)

Results

  * IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```

```
*** Final LLVM Code input to ISel ***

; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
  %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
  %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
  call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
  %index.addr = alloca i32, align 4
  %offset.addr = alloca i32, align 4
  %buffer = alloca [10 x i8], align 1
  store i32 %index, i32* %index.addr, align 4
  store i32 %offset, i32* %offset.addr, align 4
  %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
  %1 = load i32, i32* %index.addr, align 4
  call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
  %2 = load i32, i32* %index.addr, align 4
  %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
  %3 = load i8, i8* %arrayidx, align 1
  %conv = sext i8 %3 to i32
  %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
  call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
  ret i32 %conv
}
```

  * SelectionDAG generated instrumentation:

```
clang-cl /GS test.cc /O1 /c /FA
```

```
"?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
# BB#0:                                 # %entry
        pushl   %esi
        subl    $16, %esp
        movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
        movl    28(%esp), %esi
        movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
        leal    2(%esp), %eax
        pushl   %esi
        pushl   $204
        pushl   %eax
        calll   _memset
        addl    $12, %esp
        movsbl  2(%esp,%esi), %esi
        movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
        calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
        movl    %esi, %eax
        addl    $16, %esp
        popl    %esi
        retl
```

Reviewers: kcc, pcc, eugenis, rnk

Subscribers: majnemer, llvm-commits, hans, thakis, rnk

Differential Revision: http://reviews.llvm.org/D20346

llvm-svn: 272053
2016-06-07 20:15:35 +00:00
Wei Ding a70216f1b3 Differential Revision: http://reviews.llvm.org/D20557
llvm-svn: 272044
2016-06-07 19:04:44 +00:00
Geoff Berry 486f49cc63 Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.
Originally reviewed here: http://reviews.llvm.org/D17463

llvm-svn: 272023
2016-06-07 16:48:43 +00:00
Haicheng Wu 4fa9f3ae45 Revert "[MBP] Reduce code size by running tail merging in MBP."
This reverts commit r271930, r271915, r271923.  They break a thumb selfhosting
bot.

llvm-svn: 272017
2016-06-07 15:17:21 +00:00
Simon Pilgrim 15c6ab5fac [X86][AVX512] Added 512-bit integer vector non-temporal load tests
llvm-svn: 272016
2016-06-07 15:12:47 +00:00
Simon Pilgrim 9a89623b57 [X86][SSE] Add general lowering of nontemporal vector loads
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.

This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.

We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.

There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.

Differential Review: http://reviews.llvm.org/D20965

llvm-svn: 272010
2016-06-07 13:34:24 +00:00
James Molloy b101383fb5 [Thumb-1] Add optimized constant materialization for integers [256..512)
We can materialize these integers using a MOV; ADDi8 pair.

llvm-svn: 272007
2016-06-07 13:10:14 +00:00
Igor Breger 61e628591f [AVX512] Fix load opcode for fast isel.
Differential Revision: http://reviews.llvm.org/D21067

llvm-svn: 272006
2016-06-07 13:08:45 +00:00
Ulrich Weigand 6b0634b304 [PowerPC] Support multiple return values with fast isel
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.

This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).

This is needed for Swift and (possibly) other non-clang frontends.

Fixes PR26190.

llvm-svn: 272005
2016-06-07 12:48:22 +00:00
Simon Pilgrim ca1da1bf07 [X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.

This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.

llvm-svn: 272003
2016-06-07 12:20:14 +00:00
James Molloy 53298a1808 [ARM] Shrink post-indexed LDR and STR to LDM/STM
A Thumb-2 post-indexed LDR instruction such as:

  ldr.w r0, [r1], #4

Can be rewritten as:

  ldm.n r1!, {r0}

LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.

llvm-svn: 272002
2016-06-07 12:13:34 +00:00
James Molloy 75afc95112 [ARM] Transform LDMs into writeback form to save code size
If we have an LDM that uses only low registers and doesn't write to its base register:

  ldm.w r0, {r1, r2, r3}

And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:

  ldm.n r0!, {r1, r2, r3}

Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.

llvm-svn: 272000
2016-06-07 11:47:24 +00:00
Saleem Abdulrasool 532dcbc2c5 ARM: correct TLS access on WoA
TLS access requires an offset from the TLS index.  The index itself is the
section-relative distance of the symbol.  For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant.  This means that the value may
not be an immediate and must be lowered into a constant pool.  This offset will
not be base relocated.  We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.

llvm-svn: 271974
2016-06-07 03:15:07 +00:00
Matt Arsenault 3b2e2a59e8 AMDGPU: Fix constantexpr addrspacecasts
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.

llvm-svn: 271935
2016-06-06 20:03:31 +00:00
Haicheng Wu 9ed77af89d Fix a test case. NFC.
llvm-svn: 271930
2016-06-06 19:11:53 +00:00
Haicheng Wu 77ea344786 [MBP] Reduce code size by running tail merging in MBP.
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.

This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.

Differential Revision: http://reviews.llvm.org/D20276

llvm-svn: 271925
2016-06-06 18:36:07 +00:00
Artem Tamazov 135487767b [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

llvm-svn: 271900
2016-06-06 15:23:43 +00:00
Igor Breger edafb0595e [KNL] Fix UMULO lowering.
Differential Revision: http://reviews.llvm.org/D21013

llvm-svn: 271891
2016-06-06 12:24:52 +00:00
Craig Topper 33350cc406 [AVX512] Remove masked palignr intrinsics and auto-upgrade them to native IR of vector shuffle and select.
llvm-svn: 271872
2016-06-06 06:12:54 +00:00
Craig Topper 143446d5c1 [AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32.
llvm-svn: 271870
2016-06-06 05:39:10 +00:00
Craig Topper ccad6d57c1 [AVX512] Update tests to show shuffle decoding for vpshuflw/vpshufhw.
llvm-svn: 271869
2016-06-06 05:39:07 +00:00
Simon Pilgrim 64c6de4525 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines
llvm-svn: 271834
2016-06-05 15:21:30 +00:00
Simon Pilgrim 478295dadd [X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type
llvm-svn: 271831
2016-06-05 15:01:45 +00:00
Craig Topper 8eeda57a40 [AVX512] Add support for lowering PALIGNR for v64i8.
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.

llvm-svn: 271828
2016-06-05 06:29:12 +00:00
Craig Topper 5a315d4613 [AVX512] Split command lines and regenerate a test to prepare for a future commit.
llvm-svn: 271827
2016-06-05 06:29:08 +00:00
Craig Topper 9f51c9ef15 [AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled.
v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled.

llvm-svn: 271826
2016-06-05 05:35:11 +00:00
Simon Pilgrim 2ead861d07 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding
llvm-svn: 271809
2016-06-04 21:44:28 +00:00
Saleem Abdulrasool 1fcdc23a6e X86: enable TLS on Windows itanium
Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for
C++).  Enable the TLS support for the target similar to the MSVC model.

llvm-svn: 271797
2016-06-04 18:27:22 +00:00
Simon Pilgrim fd2eda4f64 [X86][AVX2] Fix v16i16 SHL lowering (PR27730)
The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result.

The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits.

llvm-svn: 271796
2016-06-04 16:45:33 +00:00
Matthias Braun c25c9ccbcb MIR: Support MachineMemOperands without associated value
This is allowed (though used rarely) and useful to keep your tests
short.

llvm-svn: 271752
2016-06-04 00:06:31 +00:00
Chad Rosier 9faa5bcf13 [AArch64] Move tests from r271677 to a more appropriately named file. NFC.
llvm-svn: 271718
2016-06-03 20:11:09 +00:00
Chad Rosier be879ea751 [AArch64] Spot SBFX-compatible code expressed with sign_extend.
This is very similar to r271677, but for extracts from i32 with the SIGN_EXTEND
acting on a arithmetic shift.

llvm-svn: 271717
2016-06-03 20:05:49 +00:00
Derek Schuff 5859a9ed80 [WebAssembly] Emit type signatures for declared functions
Under emscripten, C code can take the address of a function implemented
in Javascript (which is exposed via an import in wasm). Because imports
do not have linear memory address in wasm, we need to generate a thunk
to be the target of the indirect call; it call the import directly.

To make this possible, LLVM needs to emit the type signatures for these
functions, because they may not be called directly or referred to other
than where the address is taken.

This uses s new .s directive (.functype) which specifies the signature.

Differential Revision: http://reviews.llvm.org/D20891

Re-apply r271599 but instead of bailing with an error when a declared
function has multiple returns, replace it with a pointer argument. Also
add the test case I forgot to 'git add' last time around.

llvm-svn: 271703
2016-06-03 18:34:36 +00:00
Sjoerd Meijer 9bc93f6298 Code size optimisation: do not inline memcpy if this expansion results
in more instructions than the libary call.

Differential Revision: http://reviews.llvm.org/D20958

llvm-svn: 271678
2016-06-03 15:38:55 +00:00
Chad Rosier 2d658703e1 [AArch64] Spot SBFX-compatbile code expressed with sign_extend_inreg.
We were assuming all SBFX-like operations would have the shl/asr form, but often
when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead.

This is a port of r213754 from ARM to AArch64.

llvm-svn: 271677
2016-06-03 15:00:09 +00:00
Simon Pilgrim ff35eecd90 [X86][AVX512] Fixed 512-bit vector nontemporal load alignment
llvm-svn: 271673
2016-06-03 14:12:43 +00:00
Simon Pilgrim f92d175a78 [X86][AVX512] Added 512-bit vector nontemporal load tests
llvm-svn: 271668
2016-06-03 13:42:49 +00:00
Simon Pilgrim a6022c9a63 [X86][SSE] Added nontemporal load tests
These currently all lower to regular loads, generic nontemporal load support will be added in a future patch

llvm-svn: 271659
2016-06-03 11:00:55 +00:00
Simon Pilgrim 960ca812ed [X86] Added nontemporal scalar store tests
llvm-svn: 271656
2016-06-03 10:30:54 +00:00
Simon Pilgrim 02284541b2 [X86][SSE] Regenerated nontemporal vector store tests and added extra target types
llvm-svn: 271654
2016-06-03 10:24:24 +00:00
Simon Pilgrim 38b4661b1b [X86] Regenerated nontemporal store tests and added tests for all 128-bit vector types
llvm-svn: 271651
2016-06-03 10:15:36 +00:00
Simon Pilgrim 205f65f62f [X86][AVX2] Relaxed alignment on nontemporal store tests
llvm-svn: 271646
2016-06-03 10:06:59 +00:00
Simon Pilgrim 8ea8940677 [X86][AVX2] Regenerated nontemporal store tests and added tests for all 256-bit vector types
llvm-svn: 271645
2016-06-03 09:56:24 +00:00
Daniel Sanders 6ba3dd6b71 [mips] Implement 'la' macro in PIC mode for O32.
Summary:
N32 support will follow in a later patch since the symbol version of 'la'
incorrectly believes N32 to have 64-bit pointers and rejects it early.

This fixes the three incorrectly expanded 'la' macros found in bionic.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: http://reviews.llvm.org/D20820

llvm-svn: 271644
2016-06-03 09:53:06 +00:00
Simon Pilgrim e85506b6e0 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

llvm-svn: 271633
2016-06-03 08:06:03 +00:00
Craig Topper e7ae106147 [AVX512] Ensure EVEX vpshufd, vpshuflw, and vpshufhw have isel priority over the VEX encoded ones.
llvm-svn: 271629
2016-06-03 05:31:04 +00:00
Craig Topper 01f53b1773 [AVX512] Fix shuffle comment printing for EVEX encoded PSHUFD, PSHUFHW, and PSHUFLW.
llvm-svn: 271628
2016-06-03 05:31:00 +00:00
Derek Schuff f5bae9c1ce Revert "[WebAssembly] Emit type signatures for declared functions"
This reverts r271599, it broke the integration tests.
More places than I expected had nontrival return types in imports, or
else the check was wrong.

llvm-svn: 271606
2016-06-02 23:02:44 +00:00
Derek Schuff 23b7d65fe5 [WebAssembly] Emit type signatures for declared functions
Under emscripten, C code can take the address of a function implemented
in Javascript (which is exposed via an import in wasm). Because imports
do not have linear memory address in wasm, we need to generate a thunk
to be the target of the indirect call; it call the import directly.

To make this possible, LLVM needs to emit the type signatures for these
functions, because they may not be called directly or referred to other
than where the address is taken.

This uses s new .s directive (.functype) which specifies the signature.

Differential Revision: http://reviews.llvm.org/D20891

llvm-svn: 271599
2016-06-02 21:34:18 +00:00
Sanjay Patel dba8b4c04d transform obscured FP sign bit ops into a fabs/fneg using TLI hook
This is effectively a revert of:
http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
and:
http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions.

This is intended to resolve the objections raised on the dev list:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html
and:
https://llvm.org/bugs/show_bug.cgi?id=24886#c4

In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later.

Differential Revision: http://reviews.llvm.org/D19391

llvm-svn: 271573
2016-06-02 20:01:37 +00:00
Matt Arsenault d1097a38e2 AMDGPU: Cleanup load tests
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.

llvm-svn: 271571
2016-06-02 19:54:26 +00:00
Matt Arsenault 52dec8d36a AMDGPU: Temporary fix for broken store combine
llvm-svn: 271567
2016-06-02 19:00:55 +00:00
Matt Arsenault 8e00194be8 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

llvm-svn: 271561
2016-06-02 18:37:16 +00:00
Geoff Berry c932f533e1 [PowerPC] Run reg2mem on tests to simplify them.
Summary:
Also convert test/CodeGen/PowerPC/vsx-ldst-builtin-le.ll to use
FileCheck instead of two grep and count runs.

This change is needed to avoid spurious diffs in these tests when
EarlyCSE is improved to use MemorySSA and can do more load elimination.

Reviewers: hfinkel

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20238

llvm-svn: 271553
2016-06-02 18:02:50 +00:00
Simon Pilgrim ab95b2fe26 [X86][SSE] Added SSE41/AVX2 non-temporal tests
Useful for when we add MOVNTDQA support

llvm-svn: 271552
2016-06-02 18:01:21 +00:00
Dimitry Andric 6a482a73d6 Only attempt to detect AVG if SSE2 is available
Summary:
In PR29973 Sanjay Patel reported an assertion failure when a certain
loop was optimized, for a target without SSE2 support.  It turned out
this was because of the AVG pattern detection introduced in rL253952.

Prevent the assertion failure by bailing out early in
`detectAVGPattern()`, if the target does not support SSE2.

Also add a minimized test case.

Reviewers: congh, eli.friedman, spatel

Subscribers: emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D20905

llvm-svn: 271548
2016-06-02 17:30:49 +00:00
Geoff Berry 66f6b65fed [PEI, AArch64] Use empty spaces in stack area for local stack slot allocation.
Summary:
If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.

AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: rengolin, mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D20220

llvm-svn: 271527
2016-06-02 16:22:07 +00:00
Sanjay Patel f509d85a6d [DAG] use getBitcast() to reduce code
Although this was intended to be NFC, the test case wiggle shows a change in
code scheduling/RA caused by a difference in the SDLoc() generation.

Depending on how you look at it, this is the (dis)advantage of exact checking
in regression tests.

llvm-svn: 271526
2016-06-02 16:01:15 +00:00
Simon Pilgrim ebdc397c86 [X86][SSE] Added non-temporal load tests for vector types
These currently lower to regular loads instead of MOVNTDQA

llvm-svn: 271516
2016-06-02 13:51:50 +00:00
Simon Pilgrim 0afd5a4d80 [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm)
This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead.

Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower.

Differential Revision: http://reviews.llvm.org/D20860

llvm-svn: 271510
2016-06-02 10:55:21 +00:00
Sjoerd Meijer 0b7bb16e5b This adds support for Cortex-A73 as an available target.
Differential Revision: http://reviews.llvm.org/D20865

llvm-svn: 271508
2016-06-02 10:48:52 +00:00
Craig Topper ca9c0801e1 [X86] Add AVX 256-bit load and stores to fast isel.
I'm not sure why this was missing for so long.

This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit.

llvm-svn: 271481
2016-06-02 04:19:45 +00:00
Craig Topper f10fbfa738 [AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead.
The intrinsics will be autoupgraded to the same generic masked loads.

llvm-svn: 271478
2016-06-02 04:19:36 +00:00
Rafael Espindola 41410cc812 Avoid a load for local functions.
llvm-svn: 271437
2016-06-01 21:57:11 +00:00
Sanjay Patel b4a4357ecb [x86, AVX2] regenerate checks
llvm-svn: 271434
2016-06-01 21:32:56 +00:00
Michael Kuperstein 738ae45ce8 [DAG] Improve legalization of INSERT_SUBVECTOR
When the index is known to be constant 0, insert directly into the the low half,
instead of spilling, performing the insert in-memory, and reloading.

Differential Revision: http://reviews.llvm.org/D20763

llvm-svn: 271428
2016-06-01 20:49:35 +00:00
Keno Fischer 5573483c5b [PPC64] Fix SUBFC8 Defs list
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.

Patch by Jameson Nash <jameson@juliacomputing.com>

Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802

llvm-svn: 271425
2016-06-01 20:31:07 +00:00
Than McIntosh 4ef761aa35 Better fix for PR27903.
Summary:
Re-enable lifetime-start-on-first-use for stack coloring,
but explicitly disable it for slots with more than one start
or end lifetime marker.

Bug: 27903

Reviewers: wmi, tejohnson, qcolombet, gbiv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20739

llvm-svn: 271412
2016-06-01 17:55:10 +00:00
Simon Pilgrim 1cd61b82bd [X86][SSE] Added non-temporal store tests for all 512-bit vector types
llvm-svn: 271393
2016-06-01 13:58:00 +00:00
Simon Pilgrim 288be8bab6 [X86][SSE] Added non-temporal store tests for all 256-bit vector types
Also added KNL AVX-512 checks

llvm-svn: 271391
2016-06-01 13:20:25 +00:00
Simon Pilgrim 80f5335969 [X86][SSE] Added non-temporal store tests for all 128-bit integer vector types
llvm-svn: 271389
2016-06-01 13:05:00 +00:00
Michael Zuckerman 6a894956fc Adding back-end support to two bit scanning intrinsics
Adding LLVM back-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse.
Their functionality is as described in Intel intrinsics guide:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370

Commit on behalf of Omer Paparo Bivas


Differential Revision: http://reviews.llvm.org/D19915

llvm-svn: 271386
2016-06-01 12:02:37 +00:00
Oliver Stannard 92ca83cccd [ARM] Add additional matching for UBFX instructions
This adds an additional matcher to select UBFX(..) from SRL(AND(..)) in
ARMISelDAGToDAG to help with code size.

Patch by David Green.

Differential Revision: http://reviews.llvm.org/D20667

llvm-svn: 271384
2016-06-01 12:01:01 +00:00
Chris Dewhurst 53bde954db [Sparc] Allow passing of empty structs.
Passing an empty struct as a function call argument is now supported.

unit tests for various scenarios added.

llvm-svn: 271374
2016-06-01 08:48:56 +00:00
Craig Topper 4f2d5a68d3 Revert r271362 "[AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead."
Looks like something isn't quite right still. Also forgot to move the test cases to an autoupgrade test.

llvm-svn: 271363
2016-06-01 05:57:55 +00:00
Craig Topper dacd9d2bac [AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead.
The intrinsics will be autoupgraded to the same generic masked loads.

llvm-svn: 271362
2016-06-01 05:35:16 +00:00
Matthias Braun f9acacaa92 CodeGen: Refactor renameDisconnectedComponents() as a pass
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass.
Also change the name to "RenameIndependentSubregs":

- renameDisconnectedComponents() worked on a MachineFunction at a time
  so it is a natural candidate for a machine function pass.

- The algorithm is testable with a .mir test now.

- This also fixes a problem where the lazy renaming as part of the
  MachineScheduler introduced IMPLICIT_DEF instructions after the number
  of a nodes in a region were counted leading to a mismatch.

Differential Revision: http://reviews.llvm.org/D20507

llvm-svn: 271345
2016-05-31 22:38:06 +00:00
Kevin B. Smith ed0b620a65 [X86]: Add a pattern that uses GR16_ABCD rather than GR32_ABCD to avoid falsely marking whole 32 bit register as live.
Differential Revision: http://reviews.llvm.org/D20649

llvm-svn: 271341
2016-05-31 22:00:12 +00:00
Matthias Braun ce0bcb78e6 ARM: Improve/fix comment in recently added test.
llvm-svn: 271340
2016-05-31 21:59:59 +00:00
Matthias Braun fe725c9241 ARM: Do not attempt to modify register class of physregs.
Physregs have no associated register class, do not attempt to modify it
in Thumb2InstrInfo::storeRegToStackSlot()/loadFromStackSlot().

llvm-svn: 271339
2016-05-31 21:39:12 +00:00
Ahmed Bougacha 96ef87e910 [CodeGen] Promote FMINNAN/FMAXNAN like other binops.
We think it's OK to generate half fminnan because it's legal for the
transform-to type (f32; r245196). However, PromoteFloatRes was missing
the case; simply promote like the other binops, including minnum.

llvm-svn: 271317
2016-05-31 18:50:25 +00:00
Rafael Espindola 4d29099f7f Delete AArch64II::MO_CONSTPOOL.
A constant pool holding the address of a variable in equivalent to
a got entry. It produces exactly the same instruction sequence as a
got use and unlike a got use this is not uniqued by the linker.

llvm-svn: 271311
2016-05-31 18:31:14 +00:00
Rafael Espindola 7ad97b2fe4 Add a use of shouldAssumeDSOLocal to ARM.
Now this code path knows about position independent executables.

llvm-svn: 271290
2016-05-31 15:31:55 +00:00
Ranjeet Singh 16c24f4d6e [ARM] Add backend support for load/store intrinsics.
Added support to map intrinsics
__builtin_arm_{ldc,ldcl,ldc2,ldc2l,stc,stcl,stc2,stc2l}
to their ARM instructions.

Differential Revision: http://reviews.llvm.org/D20564

llvm-svn: 271271
2016-05-31 12:39:30 +00:00
Simon Pilgrim e05dc45897 [X86][SSE] Add load-folding patterns for (V)CVTDQ2PD (PR27291)
Added patterns for (V)CVTDQ2PD -> 2f64 loading from a 64-bit source.

llvm-svn: 271269
2016-05-31 12:04:35 +00:00
Simon Dardis 03676dc969 [mips] bnec/beqc register constraint fix
beqc and bnec cannot have $rs == $rt. Inhibit compact branch creation
if that would occur.

Reviewers: vkalintiris, dsanders

Differential Revision: http://reviews.llvm.org/D20624

llvm-svn: 271260
2016-05-31 09:54:55 +00:00
Igor Breger 73ee8ba9b0 [AVX512] Fix intrinsic vcvtps2ph lowering.
Differential Revision: http://reviews.llvm.org/D20788

llvm-svn: 271255
2016-05-31 08:04:21 +00:00
Igor Breger 52bd1d5fcc Fix intrinsic vbroadcast{i32|f32}x2 lowering.
Differential Revision: http://reviews.llvm.org/D20780

llvm-svn: 271254
2016-05-31 07:43:39 +00:00
Craig Topper 50f85c22c5 [AVX512] Remove masked store intrinsics. Clang now emits generic masked store intrinsics instead.
The intrinsics will be autoupgraded to the same generic masked stores.

llvm-svn: 271245
2016-05-31 01:50:02 +00:00
Saleem Abdulrasool d2f705ddf9 X86: permit using SjLj EH on x86 targets as an option
This adds support to the backed to actually support SjLj EH as an exception
model.  This is *NOT* the default model, and requires explicitly opting into it
from the frontend.  GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.

Addresses PR27749!

llvm-svn: 271244
2016-05-31 01:48:07 +00:00
Craig Topper 8287fd8abd [X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses them. Auto upgrade to native unaligned store instructions.
llvm-svn: 271236
2016-05-30 23:15:56 +00:00
Craig Topper 39716f8358 [X86] Use update_llc_test_checks.py to re-generate a test in preparation for an upcoming commit. NFC
llvm-svn: 271234
2016-05-30 22:54:14 +00:00
Simon Pilgrim d788c9d83d [X86][XOP] Split off auto-upgraded xop intrinsics
llvm-svn: 271228
2016-05-30 19:50:56 +00:00
Simon Pilgrim 582d75b0eb [X86][SSE] Renamed pmovxrm tests
These aren't intrinsics anymore - as discussed on D20686

llvm-svn: 271226
2016-05-30 19:14:37 +00:00
Simon Pilgrim 24da61058a [X86][AVX2] Regenerated AVX2 extension tests
llvm-svn: 271224
2016-05-30 18:49:57 +00:00
Simon Pilgrim d64af65f6d [X86][SSE] Updated storeu fast-isel tests to match clang builtin tests
Since rL271214 the headers have no longer used the storeu intrinsic

llvm-svn: 271222
2016-05-30 18:42:51 +00:00
Simon Pilgrim 4ed0e07b23 [X86][SSE2] Updated _mm_store_pd1/_mm_store1_pd fast-isel tests to match D20617
llvm-svn: 271220
2016-05-30 18:18:44 +00:00
Diana Picus f353a5e06d [BPF] Remove exit-on-error from tests (PR27768, PR27769)
The exit-on-error flag is necessary to avoid some assertions/unreachables. We
can get past them by creating a few dummy nodes.

Fixes PR27768, PR27769.

Differential Revision: http://reviews.llvm.org/D20726

llvm-svn: 271200
2016-05-30 08:28:34 +00:00
Simon Pilgrim 9602d678cb [X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 271131
2016-05-28 18:03:41 +00:00
Sanjay Patel 97c2c108fd [x86] avoid printing unnecessary sign bits of hex immediates in asm comments (PR20347)
It would be better to check the valid/expected size of the immediate operand, but this is
generally better than what we print right now.

Differential Revision: http://reviews.llvm.org/D20385

llvm-svn: 271114
2016-05-28 14:58:37 +00:00
Ahmed Bougacha a3dc1ba142 [X86] Try to zero elts when lowering 256-bit shuffle with PSHUFB.
Otherwise we fallback to a blend of PSHUFBs later on.

Differential Revision: http://reviews.llvm.org/D19661

llvm-svn: 271113
2016-05-28 14:38:04 +00:00
Rafael Espindola fe796dca90 Fix default reloc model on ARM.
llvm-svn: 271111
2016-05-28 10:41:15 +00:00
Renato Golin 9be88629d5 Revert "Revert "Map DynamicNoPIC to Static on non-darwin.""
This reverts commit r271096, as reverting it broke even more buildbots!

But that also means I'll break on ARM again... :(

llvm-svn: 271099
2016-05-28 04:47:13 +00:00
Renato Golin 4f22c51b09 Revert "Map DynamicNoPIC to Static on non-darwin."
This reverts commit r271052, as it broke some ARM buildbots.

llvm-svn: 271096
2016-05-28 04:24:26 +00:00
Matt Arsenault 1ff389a7bf AMDGPU: Cleanup vector insert/extract tests
This mostly makes sure that 3-vector dynamic inserts
and extracts are covered.

llvm-svn: 271082
2016-05-28 00:51:06 +00:00
Matt Arsenault 7401516985 AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

llvm-svn: 271078
2016-05-28 00:19:52 +00:00
Rafael Espindola f9bda6805b Map DynamicNoPIC to Static on non-darwin.
DynamicNoPIC was only every used on darwin. This maps it to static on
ELF. It matches what is done on X86.

llvm-svn: 271052
2016-05-27 21:44:18 +00:00
Michael Kuperstein a75c77b127 [X86] Detect SAD patterns and emit psadbw instructions.
This recommits r267649 with a fix for PR27539.

Differential Revision: http://reviews.llvm.org/D20598

llvm-svn: 271033
2016-05-27 18:53:22 +00:00
Simon Pilgrim 7e67a22298 [X86][AVX] Removed some remains of old (pre-regeneration) filechecks
llvm-svn: 271007
2016-05-27 15:56:19 +00:00
Than McIntosh 4daf7f13b6 Disable lifetime-start-on-first-use analysis.
Summary:
Turn off lifetime-start-on-first-use enhancement for the moment
pending a fix for bug 27903.

Bug: 27903

Reviewers: tejohnson, wmi, qcolombet, gbiv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20731

llvm-svn: 271003
2016-05-27 15:27:51 +00:00
Simon Pilgrim 4642a57fbf Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
llvm-svn: 270976
2016-05-27 09:02:25 +00:00
Simon Pilgrim c013e5737b [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

A companion patch (D20684) removes/auto-upgrade the clang intrinsics.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 270973
2016-05-27 08:49:15 +00:00
Mitch Bodart 05aeeb5cf1 [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker
CriticalAntiDepBreaker was not correctly tracking defs of the high X86 byte
registers, leading to incorrect use of a busy register to break an
antidependence.

Fixes pr27681, and its duplicates pr27580, pr27804.

Differential Revision: http://reviews.llvm.org/D20456

llvm-svn: 270935
2016-05-26 23:08:52 +00:00
Krzysztof Parzyszek da0b9a959e [Hexagon] Enable the post-RA scheduler
The aggressive anti-dependency breaker can rename the restored callee-
saved registers. To prevent this, mark these registers are live on all
paths to the return/tail-call instructions, and add implicit use operands
for them to these instructions.

llvm-svn: 270898
2016-05-26 19:44:28 +00:00
Chad Rosier 14aa2ad1f4 [AArch64] Generate rev16/rev32 from bswap + srl when upper bits are known zero.
Canonicalize (srl (bswap i32 x), 16) to (rotr (bswap i32 x), 16), if the high
16-bits of x are zero. Similarly, canonicalize (srl (bswap i64 x), 32) to
(rotr (bswap i64 x), 32), if the high 32-bits of x are zero.

test_rev_w_srl16:            test_rev_w_srl16:
  and w8, w0, #0xffff          and     w8, w0, #0xffff
  rev w8, w8           --->    rev16   w0, w8
  lsr     w0, w8, #16

test_rev_x_srl32:            test_rev_x_srl32:
  rev x8, x8           --->    rev32   x0, x8
  lsr x0, x8, #32

llvm-svn: 270896
2016-05-26 19:41:33 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Krzysztof Parzyszek 729e7ad31f Add test/CodeGen/MIR/Hexagon/lit.local.cfg
Require that Hexagon is a registered target.

llvm-svn: 270887
2016-05-26 18:35:45 +00:00
Krzysztof Parzyszek 143f684a79 Do not rename registers that do not start an independent live range
llvm-svn: 270885
2016-05-26 18:22:53 +00:00
Artem Belevich 49e9a81236 [NVPTX] Added NVVMIntrRange pass
NVVMIntrRange adds !range metadata to calls of NVVM intrinsics
that return values within known limited range.

This allows LLVM to generate optimal code for indexing arrays
based on tid/ctaid which is a frequently used pattern in CUDA code.

Differential Revision: http://reviews.llvm.org/D20644

llvm-svn: 270872
2016-05-26 17:02:56 +00:00
Simon Pilgrim cf340bd9c1 [X86][SSE] When lowering a 256-bit shuffle as PMOVZX, reduce the input vector to the lower 128-bit subvector.
Most often as not this is what it started out as, the extraction is zero-cost on AVX and the PMOVZX/PMOVSX folding logic is based around 128-bit loads.

llvm-svn: 270858
2016-05-26 15:40:36 +00:00
Diana Picus 81bc3170e8 [AMDGPU] Remove exit-on-error flag from test (PR27762)
Similar to r269948, but for argument lowering.

Fixes PR27762

Differential Revision: http://reviews.llvm.org/D20430

llvm-svn: 270856
2016-05-26 15:24:55 +00:00
Diana Picus 20a8d8e97e [BPF] Remove exit-on-error flag in test (PR27767)
The exit-on-error flag is needed to avoid an assert where
llvm::SelectionDAGISel::LowerArguments doesn't create enough arguments. Fill up
with zeroes to reach the right number of args.

Fixes PR27767.

Differential Revision: http://reviews.llvm.org/D20571

llvm-svn: 270855
2016-05-26 15:23:50 +00:00
Simon Pilgrim 50c37ceb3b [X86][SSE] Added load_zext_16i8_to_8i32 test
Odd issue with input vector not being folded into pmovzx on AVX2+ targets

llvm-svn: 270852
2016-05-26 14:45:30 +00:00
Chad Rosier 816a67da49 [AArch64] Generate a BFI/BFXIL from 'or (and X, MaskImm), OrImm'.
If and only if the value being inserted sets only known zero bits.

This combine transforms things like

  and w8, w0, #0xfffffff0
  movz w9, #5
  orr w0, w8, w9

into

  movz w8, #5
  bfxil w0, w8, #0, #4

The combine is tuned to make sure we always reduce the number of instructions.
We avoid churning code for what is expected to be performance neutral changes
(e.g., converted AND+OR to OR+BFI).

Differential Revision: http://reviews.llvm.org/D20387

llvm-svn: 270846
2016-05-26 13:27:56 +00:00
Rafael Espindola a224de06bc Use shouldAssumeDSOLocal on AArch64.
This reduces code duplication and now AArch64 also handles PIE.

llvm-svn: 270844
2016-05-26 12:42:55 +00:00
Igor Breger 8437bb70fd [AVX512] Fix intrinsic cmp{sd|ss} lowering.
Differential Revision: http://reviews.llvm.org/D20615

llvm-svn: 270843
2016-05-26 12:42:25 +00:00
Simon Pilgrim ab3809193c [X86][F16C] Added F16C fast-isel tests to match clang/test/CodeGen/f16c-builtins.c
llvm-svn: 270837
2016-05-26 10:26:56 +00:00
Simon Pilgrim 0e4fdc0842 [X86][AVX2] Added gather fast-isel tests to match clang/test/CodeGen/avx2-builtins.c
llvm-svn: 270835
2016-05-26 10:07:05 +00:00
Simon Pilgrim d6469e3467 [X86][SSE41] Removed pblendw intrinsics tests - they are auto-upgraded
Equivalent tests included in sse41-intrinsics-x86-upgrade.ll - the i8/i32 immediate diff doesn't matter anymore

llvm-svn: 270767
2016-05-25 21:27:58 +00:00
Simon Pilgrim fa814259ad [X86][SSE41] Regenerated intrinsics tests
llvm-svn: 270764
2016-05-25 21:21:51 +00:00
Simon Pilgrim 1bed207f88 [X86][SSE41] Removed blendpd/blendps intrinsics tests - they are auto-upgraded
Equivalent tests included in sse41-intrinsics-x86-upgrade.ll

llvm-svn: 270761
2016-05-25 21:06:36 +00:00
Simon Pilgrim 971abe8256 [X86][AVX2] Regenerate avx2 vector shift tests
llvm-svn: 270756
2016-05-25 21:00:40 +00:00
Rafael Espindola 84f0562064 Fix shouldAssumeDSOLocal for private linkage.
llvm-svn: 270746
2016-05-25 19:55:16 +00:00
Matt Arsenault e57206d81b AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

llvm-svn: 270740
2016-05-25 18:07:36 +00:00
Matt Arsenault d89c99c26a AMDGPU: Fix missing br_cc i1 test coverage
Also un xfail a test.

llvm-svn: 270739
2016-05-25 17:58:27 +00:00
Chad Rosier e5314a94eb [SelectionDAG] Add smarts for BSWAP in computeKnownBits.
llvm-svn: 270738
2016-05-25 17:52:38 +00:00
Matt Arsenault 4578d6a9e1 AMDGPU: Make vectorization defeating test changes
Simplifies test updates in the future.

llvm-svn: 270736
2016-05-25 17:42:39 +00:00
Matt Arsenault 1cc4991412 AMDGPU: Fix inconsistent lowering of select of vectors
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.

llvm-svn: 270731
2016-05-25 17:34:58 +00:00
Tim Shen fa57367ae5 Move and add comments to the top for tailcall-string-rvo.ll
Differential Revision: http://reviews.llvm.org/D20311

llvm-svn: 270722
2016-05-25 17:01:09 +00:00
Hal Finkel 6f3387f434 [SDAG] Add a fallback multiplication expansion
LegalizeIntegerTypes does not have a way to expand multiplications for large
integer types (i.e. larger than twice the native bit width). There's no
standard runtime call to use in that case, and so we'd just assert.

Unfortunately, as it turns out, it is possible to hit this case from
standard-ish C code in rare cases. A particular case a user ran into yesterday
involved an __int128 induction variable and a loop with a quadratic (not
linear) recurrence which triggered some backend logic using SCEVExpander. In
this case, the BinomialCoefficient code in SCEV generates some i129 variables,
which get widened to i256. At a high level, this is not actually good (i.e. the
underlying optimization, PPCLoopPreIncPrep, should not be transforming the loop
in question for performance reasons), but regardless, the backend shouldn't
crash because of cost-modeling issues in the optimizer.

This is a straightforward implementation of the multiplication expansion, based
on the algorithm in Hacker's Delight. I validated it against the code for the
mul256b function from http://locklessinc.com/articles/256bit_arithmetic/ using
random inputs. There should be no functional change for previously-working code
(the new expansion code only replaces an assert).

Fixes PR19797.

llvm-svn: 270720
2016-05-25 16:50:22 +00:00
Sanjay Patel 3955360b24 [x86, AVX] allow explicit calls to VZERO* to modify state in VZeroUpperInserter pass (PR27823)
As noted in the review, there are still problems, so this doesn't the bug completely.

Differential Revision: http://reviews.llvm.org/D20529

llvm-svn: 270718
2016-05-25 16:39:47 +00:00
Simon Pilgrim 11081c98a3 [X86][AVX] Sync with clang/test/CodeGen/avx2-builtins.c
Only tests for the gather intrinsic are still to be added

llvm-svn: 270710
2016-05-25 15:30:08 +00:00
Simon Pilgrim 1bcf9847a4 [X86][AVX2] Added more fast-isel tests to match clang/test/CodeGen/avx2-builtins.c
llvm-svn: 270685
2016-05-25 10:56:23 +00:00
Simon Pilgrim c7dcbdc08a [X86][AVX2] Begun adding fast-isel tests to match clang/test/CodeGen/avx2-builtins.c
llvm-svn: 270683
2016-05-25 10:15:06 +00:00
Simon Pilgrim 4d1e258097 [X86][SSE2] Use storeu intrinsics for _mm_storeu_pd/_mm_storeu_pd tests
Also fixed name of _mm_store1_pd test

llvm-svn: 270681
2016-05-25 09:42:29 +00:00
Simon Pilgrim f0ba364fb9 [X86][SSE] Use storeu intrinsics for _mm_storeu_ps test
llvm-svn: 270680
2016-05-25 09:28:06 +00:00
Simon Pilgrim 4298d06d0f [X86][SSE] Replace (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) lossless conversion intrinsics with generic IR
Followup to D20528 clang patch, this removes the (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) llvm intrinsics and auto-upgrades to sitofp/fpext instead.

Differential Revision: http://reviews.llvm.org/D20568

llvm-svn: 270678
2016-05-25 08:59:18 +00:00
Craig Topper 12e322a8cf [X86] Remove the llvm.x86.sse2.storel.dq intrinsic. It hasn't been used in a long time.
llvm-svn: 270677
2016-05-25 06:56:32 +00:00
Dan Gohman d530f68d45 [WebAssembly] Put __stack_pointer in the offset field of loads and stores.
Instead of this:

i32.const       $push10=, __stack_pointer
i32.load        $push11=, 0($pop10)

Emit this:

i32.const       $push10=, 0
i32.load        $push11=, __stack_pointer($pop10)

It's not currently clear which is better, though there's a chance the second
form may be better at overall compression. We can revisit this when we have
more data; for now it makes sense to make PEI consistent with isel.

Differential Revision: http://reviews.llvm.org/D20411

llvm-svn: 270635
2016-05-24 23:47:41 +00:00
Konstantin Zhuravlyov 29ddd2b2f2 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Than McIntosh 879ad8fa99 Rework/enhance stack coloring data flow analysis.
Replace bidirectional flow analysis to compute liveness with forward
analysis pass. Treat lifetimes as starting when there is a first
reference to the stack slot, as opposed to starting at the point of the
lifetime.start intrinsic, so as to increase the number of stack
variables we can overlap.

Reviewers: gbiv, qcolumbet, wmi
Differential Revision: http://reviews.llvm.org/D18827

Bug: 25776
llvm-svn: 270559
2016-05-24 13:23:44 +00:00
Simon Pilgrim caf0d9d92c [X86][SSE] Added vector sitofp/uitofp folded load tests
llvm-svn: 270558
2016-05-24 13:07:23 +00:00
Igor Breger 23c2090606 [llvm][AVX512][intrinsics] Fix vperm{b|w|d|q|ps|pd} intrinsics. Index is second argument to buildin function but it is first instruction operand.
Differential Revision: http://reviews.llvm.org/D20515

llvm-svn: 270548
2016-05-24 11:06:22 +00:00
Simon Pilgrim 8a5ff3c59a [X86][SSE] Updated (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) fast-isel codegen to match D20528
llvm-svn: 270501
2016-05-23 22:17:36 +00:00
Simon Pilgrim 8cfcf586bb [X86][SSE] Added cvtdq2pd/cvtps2pd generic IR tests
Added D20528 implementations as well as existing x86 intrinsics versions

llvm-svn: 270494
2016-05-23 21:45:02 +00:00
Simon Pilgrim f615191fa6 [X86][SSE] Use shuffle/sext instead of deprecated (+ auto-upgraded) pmovsxwd intrinsic call
llvm-svn: 270489
2016-05-23 21:21:38 +00:00
James Y Knight fdcc727da6 [SPARC] Fix 8 and 16-bit atomic load and store.
They were accidentally using the 32-bit load/store instruction for
8/16-bit operations, due to incorrect patterns

(8/16-bit cmpxchg and atomicrmw will be fixed in subsequent changes)

llvm-svn: 270486
2016-05-23 20:33:00 +00:00
Diana Picus b2da61196e [BPF] Remove exit-on-error flag in test (PR27766)
The exit-on-error flag on the many_args1.ll test is needed to avoid an
unreachable in BPFTargetLowering::LowerCall. We can also avoid it by ignoring
any superfluous arguments to the call (i.e. any arguments after the first 5).

Fixes PR27766.

Differential Revision: http://reviews.llvm.org/D20471

v2 of r270419

llvm-svn: 270440
2016-05-23 14:57:19 +00:00
Asaf Badouh d32e4c9f0d [X86][RTM] _xabort() should not have "noreturn" attribute
Differential Revision: http://reviews.llvm.org/D20518

llvm-svn: 270437
2016-05-23 14:04:17 +00:00
Simon Pilgrim 7cc9814aaf [X86][AVX] Added tests that access ymm registers before and after explicit vzeroupper/vzeroall calls
llvm-svn: 270434
2016-05-23 13:03:45 +00:00
Renato Golin 2546b5ac5f Reverts "[BPF] Remove exit-on-error flag in test (PR27766)"
This patch reverts r270419 because it broke a lot of buildbots,
mostly Windows. We'd like help in investigating the issues, but
for now, it should stay out.

llvm-svn: 270433
2016-05-23 13:02:11 +00:00
Simon Pilgrim 4adbf23e1f [X86][SSE] Regenerated scalar load folding tests
llvm-svn: 270431
2016-05-23 12:53:09 +00:00
Simon Pilgrim 07002e86e3 [X86][SSE] Regenerated partial register update tests
llvm-svn: 270430
2016-05-23 12:49:37 +00:00
Simon Pilgrim e699370f3b [X86][SSE] Updated sse/avx cvtsi2sd tests to use non-constant value
llvm-svn: 270425
2016-05-23 12:41:51 +00:00
Simon Pilgrim e6f4d28d6a [X86][SSE2] Regenerated sse2 upgraded intrinsics tests
llvm-svn: 270423
2016-05-23 12:40:11 +00:00
Simon Pilgrim b24542c588 [X86][AVX] Regenerated avx upgraded intrinsics tests
llvm-svn: 270422
2016-05-23 12:39:06 +00:00
Diana Picus eaf34cf67e [BPF] Remove exit-on-error flag in test (PR27766)
The exit-on-error flag on the many_args1.ll test is needed to avoid an
unreachable in BPFTargetLowering::LowerCall. We can also avoid it by ignoring
any superfluous arguments to the call (i.e. any arguments after the first 5).

Fixes PR27766

llvm-svn: 270419
2016-05-23 12:33:34 +00:00
Chris Dewhurst 4f7cac3674 [Sparc][LEON] LEON Erratum fix. Insert NOP after LD or LDF instruction.
Due to an erratum in some versions of LEON, we must insert a NOP after any LD or LDF instruction to ensure the processor has time to load the value correctly before using it. This pass will implement that erratum fix.

The code will have no effect for other Sparc, but non-LEON processors.

Differential Review: http://reviews.llvm.org/D20353

llvm-svn: 270417
2016-05-23 10:56:36 +00:00
Craig Topper 95bdabd338 [AVX512] Add patterns to implement stores of extracts of least signficant subvectors using XMM or YMM stores instead of the vector extract instructions.
Similar is already done for AVX and we had lost it going to AVX512VL.

llvm-svn: 270383
2016-05-22 23:44:33 +00:00
Simon Pilgrim 1ced2a6390 [X86][SSE] Added extra i8 extract element test
llvm-svn: 270379
2016-05-22 20:35:42 +00:00
Sanjay Patel 2959ff4a88 [x86, AVX] don't add a vzeroupper if that's what the code is already doing (PR27823)
This isn't the complete fix, but it handles the trivial examples of duplicate vzero* ops in PR27823:
https://llvm.org/bugs/show_bug.cgi?id=27823
...and amusingly, the bogus cases already exist as regression tests, so let's take this baby step.

We'll need to do more in the general case where there's legitimate AVX usage in the function + there's
already a vzero in the code.

Differential Revision: http://reviews.llvm.org/D20477

llvm-svn: 270378
2016-05-22 20:22:47 +00:00
Sanjay Patel f71fc95173 [x86, AVX] add test file to show vzeroupper pass excesses
llvm-svn: 270375
2016-05-22 19:55:48 +00:00
Igor Breger 2ba64ab9ae [AVX512] Implement missing patterns for any_extend load lowering.
Differential Revision: http://reviews.llvm.org/D20513

llvm-svn: 270357
2016-05-22 10:21:04 +00:00
Craig Topper a1041ff001 [AVX512] Add an AddedComplexity line to the 512-bit insert_subvector undef index 0 patterns. This gives them higher priority than the memory patterns. This matches AVX1/2.
llvm-svn: 270355
2016-05-22 07:40:40 +00:00
Craig Topper dca03f8596 [X86] Add a common check-prefix to both run lines on a test so identical checks appear just once.
llvm-svn: 270345
2016-05-22 00:39:33 +00:00
Craig Topper 33c550cb95 [AVX512] Add a couple patterns to fix some cases where two vector mask inversions could appear in a row.
llvm-svn: 270344
2016-05-22 00:39:30 +00:00
Craig Topper db960eddfa [AVX512] Add patterns for extracting subvectors and storing to memory.
llvm-svn: 270334
2016-05-21 22:50:14 +00:00
Michael Zuckerman a63a129749 [Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics.
Differential Revision: http://reviews.llvm.org/D20438

llvm-svn: 270322
2016-05-21 14:44:18 +00:00
Michael Zuckerman 11b55b29d1 [Clang][AVX512][intrinsics] Fix vscalef intrinsics.
Differential Revision: http://reviews.llvm.org/D20324

llvm-svn: 270321
2016-05-21 11:09:53 +00:00
Craig Topper 02626c076b [AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled.
llvm-svn: 270318
2016-05-21 07:08:56 +00:00
Craig Topper 22ae353207 [AVX512] Disable AVX2 VPERMD, VPERMQ, VPERMPS, and VPERMPD patterns when AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones.
llvm-svn: 270317
2016-05-21 06:07:18 +00:00
Craig Topper 6be70deda3 [AVX512] Disable AVX/AVX2 VBROADCASTSS/VBROADCASTSD patterns when AVX512VL is enabled.
llvm-svn: 270316
2016-05-21 05:47:25 +00:00
Craig Topper 1a23a521bb [AVX512] Use update_llc_test_checks to update some tests so we can see all the instruction encodings and ensure everything is with EVEX.
llvm-svn: 270315
2016-05-21 05:46:58 +00:00
Craig Topper 73f48f4662 [AVX512] Fix test cases I missed in r270311.
llvm-svn: 270313
2016-05-21 03:59:55 +00:00
Matt Arsenault 7f9eabd2c2 AMDGPU: Define priorities for register classes
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).

Patch by Matthias Braun

llvm-svn: 270312
2016-05-21 03:55:07 +00:00
Matt Arsenault 71e6676169 AMDGPU: Cleanup lowering actions
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.

llvm-svn: 270307
2016-05-21 02:27:49 +00:00
Matt Arsenault 81a709503d AMDGPU: Fix high bits after division optimization
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.

llvm-svn: 270305
2016-05-21 01:53:33 +00:00
Matt Arsenault b6e1cc2a92 AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

llvm-svn: 270301
2016-05-21 00:53:42 +00:00
Matt Arsenault 4945905f5f AMDGPU: Handle cbranch vccz/vccnz
llvm-svn: 270297
2016-05-21 00:29:40 +00:00
Matt Arsenault 72fcd5f597 AMDGPU: Implement ReverseBranchCondition
llvm-svn: 270296
2016-05-21 00:29:34 +00:00
Matt Arsenault 6d09380532 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

llvm-svn: 270295
2016-05-21 00:29:27 +00:00
Dan Gohman b7c2400fa7 [WebAssembly] Optimize away return instructions using fallthroughs.
This saves a small amount of code size, and is a first small step toward
passing values on the stack across block boundaries.

Differential Review: http://reviews.llvm.org/D20450

llvm-svn: 270294
2016-05-21 00:21:56 +00:00
Matthias Braun 71f9564e7f LiveIntervalAnalysis: Rework constructMainRangeFromSubranges()
We now use LiveRangeCalc::extendToUses() instead of a specially designed
algorithm in constructMainRangeFromSubranges():
- The original motivation for constructMainRangeFromSubranges() were
  differences between the main liverange and subranges because of hidden
  dead definitions. This case however cannot happen anymore with the
  DetectDeadLaneMasks pass in place.
- It simplifies the code.
- This fixes a longstanding bug where we did not properly create new SSA
  values on merging control flow (the MachineVerifier missed most of
  these cases).
- Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and
  LiveRangeCalc to better match the implementation/available helper
  functions.

This re-applies r269016. The fixes from r270290 and r270259 should avoid
the machine verifier problems this time.

llvm-svn: 270291
2016-05-20 23:14:56 +00:00
Matthias Braun e29b7689bd MachineVerifier: subregs so not require defs/valnos on every path
It is fine for subregister ranges to be undefined on some CFG paths as
we may have a "vregX:other_subreg<read-undef> =" def on that path. We
do not (and should not) have live segments for the subregister ranges.
The MachineVerifier should not complain about this.

This is a slight variant of http://llvm.org/PR27705

llvm-svn: 270290
2016-05-20 23:02:13 +00:00
Tim Shen 95e84c5123 [PowerPC] Add a testcase for TCO on string rvo function
Differential Revision: http://reviews.llvm.org/D20311

llvm-svn: 270287
2016-05-20 22:42:01 +00:00
Jacques Pienaar 5ffdef55f0 [lanai] Change reloc to use PIC_ by default and cleanup.
* Change reloc to PIC_;
* Cleanup (clang-format & modify test);

llvm-svn: 270282
2016-05-20 21:41:53 +00:00
Matthias Braun 858d1df246 LiveIntervalAnalysis: Fix missing defs in renameDisconnectedComponents().
Fix renameDisconnectedComponents() creating vreg uses that can be
reached from function begin withouthaving a definition (or explicit
live-in). Fix this by inserting IMPLICIT_DEF instruction before
control-flow joins as necessary.

Removes an assert from MachineScheduler because we may now get
additional IMPLICIT_DEF when preparing the scheduling policy.

This fixes the underlying problem of http://llvm.org/PR27705

llvm-svn: 270259
2016-05-20 19:46:13 +00:00
Jun Bum Lim b21d4e17a2 [AArch64] Disable narrow load merge by default
Summary:
As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.

Reviewers: t.p.northover, mcrosier, jmolloy

Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20172

llvm-svn: 270251
2016-05-20 18:45:49 +00:00
Simon Pilgrim 55ef3da27b [X86][AVX] Generalized matching for target shuffle combines
This patch is a first step towards a more extendible method of matching combined target shuffle masks.

Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.

I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.

Differential Revision: http://reviews.llvm.org/D19198

llvm-svn: 270230
2016-05-20 16:19:30 +00:00
Simon Pilgrim acb71db577 [X86][AVX] Sync with clang/test/CodeGen/avx-builtins.c
llvm-svn: 270229
2016-05-20 16:05:55 +00:00
Rafael Espindola c7e9813228 Refactor X86 symbol access classification.
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.

The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.

llvm-svn: 270209
2016-05-20 12:20:10 +00:00
Rafael Espindola 8571aa3d5d Simplify handling of hidden stubs on PowerPC.
We now handle them just like non hidden ones. This was already the case
on x86 (r207518) and arm (r207517).

llvm-svn: 270205
2016-05-20 12:00:52 +00:00
Chris Dewhurst 0dfa6bc004 [Sparc] Enable more inline assembly constraints.
Note: This is specifically to allow GCC's test pr44707 to pass.

Trivial change, not put for differential revision. Test included.

llvm-svn: 270192
2016-05-20 09:03:01 +00:00
Craig Topper 25363178bb [X86] Run the AVX/AVX2 intrinsic tests in AVX512VL mode too just to make sure we don't break any older intrinsics.
llvm-svn: 270183
2016-05-20 05:10:32 +00:00
Craig Topper 565463fbba Revert accidental commit of a test command line addition.
llvm-svn: 270175
2016-05-20 02:01:51 +00:00
Craig Topper 0a7a8dee2b [X86] Fix some AVX patterns to only be disabled if VLX and BWI are supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL.
llvm-svn: 270174
2016-05-20 02:00:08 +00:00
Matthew Simpson 476c0afc01 [ARM, AArch64] Match additional patterns to ldN instructions
When matching an interleaved load to an ldN pattern, the interleaved access
pass checks that all users of the load are shuffles. If the load is used by an
instruction other than a shuffle, the pass gives up and an ldN is not
generated. This patch considers users of the load that are extractelement
instructions. It attempts to modify the extracts to use one of the available
shuffles rather than the load. After the transformation, the load is only used
by shuffles and will then be matched with an ldN pattern.

Differential Revision: http://reviews.llvm.org/D20250

llvm-svn: 270142
2016-05-19 21:39:00 +00:00
Hans Wennborg 172eee9cfc X86: Don't reset the stack after calls that don't return (PR27117)
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.

Differential Revision: http://reviews.llvm.org/D20406

llvm-svn: 270109
2016-05-19 20:15:33 +00:00
Sanjay Patel c48a879ef8 [x86] add tests for urem lowering
llvm-svn: 270096
2016-05-19 18:57:54 +00:00
Simon Pilgrim 7a8dcf2556 [X86][SSE] Added fast-isel tests to sync with clang/test/CodeGen/sse-builtins.c
llvm-svn: 270081
2016-05-19 16:55:52 +00:00
Simon Pilgrim b1ff2dd145 [X86][SSE2] Fixed shuffle of results in _mm_cmpnge_sd/_mm_cmpngt_sd tests
llvm-svn: 270080
2016-05-19 16:49:53 +00:00
Chad Rosier 02f25a9565 [AArch64 ] Generate a BFXIL from 'or (and X, Mask0Imm),(and Y, Mask1Imm)'.
Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted
mask (e.g., 0x000ffff0).  Both 'and's must have a single use.

This changes code like:

  and w8, w0, #0xffff000f
  and w9, w1, #0x0000fff0
  orr w0, w9, w8

into

  lsr w8, w1, #4
  bfi w0, w8, #4, #12

llvm-svn: 270063
2016-05-19 14:19:47 +00:00
Ranjeet Singh dbbbef5401 [ARM] Add cdp intrinsic tests.
- Renamed intrinsics.ll to intrinsics-coprocessor.ll
  as all the tests were testing coprocessor instructions,
  also made the test checks match the full instruction.

Differential Revision: http://reviews.llvm.org/D20393

llvm-svn: 270057
2016-05-19 12:59:17 +00:00
Simon Pilgrim 47825fad71 [X86][SSE2] Added _mm_move_* tests
llvm-svn: 270046
2016-05-19 11:59:57 +00:00
Simon Pilgrim 01809e0506 [X86][SSE2] Added _mm_cast* and _mm_set* tests
llvm-svn: 270041
2016-05-19 10:58:54 +00:00
Daniel Sanders 2f2ab5102c [mips][mips16] Fix ZERO is not a CPU16Regs register error from the machine verifier.
Summary: Partially fixes PR27458

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: http://reviews.llvm.org/D20330

llvm-svn: 270037
2016-05-19 10:42:14 +00:00
Andrey Turetskiy 45b22a4aff [X86] Enable RRL part of the LEA optimization pass for -O2.
Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2.
This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro.
There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006).

Differential Revision: http://reviews.llvm.org/D19659

llvm-svn: 270036
2016-05-19 10:18:29 +00:00
Dan Gohman 537bc9b9f5 [WebAssembly] Make several CHECK lines less fragile using regexes and CHECK-DAG.
llvm-svn: 270011
2016-05-19 01:52:56 +00:00
Matt Arsenault c438ef574d AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

llvm-svn: 270000
2016-05-18 23:20:24 +00:00
Rafael Espindola 8c34dd8257 Delete Reloc::Default.
Having an enum member named Default is quite confusing: Is it distinct
from the others?

This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.

llvm-svn: 269988
2016-05-18 22:04:49 +00:00
Krzysztof Parzyszek 14a1c18448 When looking for a spill slot in reg scavenger, find one that matches RC
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.

Differential Revision: http://reviews.llvm.org/D20295

llvm-svn: 269969
2016-05-18 18:16:00 +00:00
Simon Pilgrim 5a0d728181 [X86][SSE2] Added fast-isel tests to sync with clang/test/CodeGen/sse2-builtins.c
llvm-svn: 269966
2016-05-18 18:00:43 +00:00
Matt Arsenault 1735da460b AMDGPU: Other sizes of popcnt are fast
We can chain bcnt instructions together, so
any width popcnt is pretty fast.

llvm-svn: 269950
2016-05-18 16:10:19 +00:00
Hans Wennborg 8eb336c14e Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.

llvm-svn: 269949
2016-05-18 16:10:17 +00:00
Matt Arsenault 9430b9113a AMDGPU: Fix assert when erroring on a call
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.

llvm-svn: 269948
2016-05-18 16:10:11 +00:00
Matt Arsenault 891fccc0c1 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

llvm-svn: 269945
2016-05-18 15:57:21 +00:00
Matt Arsenault 71fa1f375e AMDGPU: Fix a few slightly broken tests
Fix minor bugs and uses of undef which break when
pointer related optimization passes are run.

llvm-svn: 269944
2016-05-18 15:48:44 +00:00
Krzysztof Parzyszek ca3b532e2c [Hexagon] Recognize "q" and "v" in inline-asm as register constraints
llvm-svn: 269933
2016-05-18 14:34:51 +00:00
Dan Gohman b4c3c38276 [WebAssembly] Don't expand divisions by constants.
Don't expand divisions by constants if it would require multiple instructions.
The current assumption is that engines will perform the desired optimizations.

llvm-svn: 269930
2016-05-18 14:29:42 +00:00
Simon Pilgrim 9829df5d56 [X86][SSE42] Added fast-isel tests to sync with clang/test/CodeGen/sse42-builtins.c
llvm-svn: 269929
2016-05-18 14:28:54 +00:00
Simon Pilgrim 3b93835f5d [X86][SSE41] Sync with clang/test/CodeGen/sse41-builtins.c
llvm-svn: 269925
2016-05-18 13:46:10 +00:00