Commit Graph

14697 Commits

Author SHA1 Message Date
Joseph Tremoulet 81e81960e3 [WinEH] Verify consistent funclet unwind exits
Summary:
A funclet EH pad may be exited by an unwind edge, which may be a
cleanupret exiting its cleanuppad, an invoke exiting a funclet, or an
unwind out of a nested funclet transitively exiting its parent.  Funclet
EH personalities require all such exceptional exits from a given funclet to
have the same unwind destination, and EH preparation / state numbering /
table generation implicitly depends on this.  Formalize it as a rule of
the IR in the LangRef and verifier.


Reviewers: rnk, majnemer, andrew.w.kaylor

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15962

llvm-svn: 257273
2016-01-10 04:30:02 +00:00
Michael Zolotukhin 0fc89c67cc Revert "[BranchFolding] Set correct mem refs"
This reverts commit 1ff11017d2669b933b29fcbb6451cfcda34ad693.

llvm-svn: 257270
2016-01-09 23:53:16 +00:00
Simon Pilgrim c7bebcbfd8 [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.

This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.

llvm-svn: 257266
2016-01-09 20:59:39 +00:00
Simon Pilgrim 2e7a1849c9 [X86][AVX] Add support for i64 broadcast loads on 32-bit targets
Added 32-bit AVX1/AVX2 broadcast tests.

llvm-svn: 257264
2016-01-09 19:59:27 +00:00
Junmo Park e1582cec34 [BranchFolding] Set correct mem refs
Merge MBBICommon and MBBI's MMOs.

Differential Revision: http://reviews.llvm.org/D15990

llvm-svn: 257253
2016-01-09 07:30:13 +00:00
Sanjay Patel 1dc7dfb9d9 [DAGCombiner] don't dereference an operand that doesn't exist (PR26070)
The bug was introduced with changes for x86-64 fp128:
http://reviews.llvm.org/rL254653

I don't know why an x86 change is here, so I'll follow up in:
http://reviews.llvm.org/D15134

Should fix:
https://llvm.org/bugs/show_bug.cgi?id=26070

llvm-svn: 257200
2016-01-08 19:53:24 +00:00
Weiming Zhao 4b3b13d3bc RBIT Instruction only available for ARMv6t2 and above.
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.

RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.

Patch by Z. Zheng <zhaoshiz@codeaurora.org>

Reviewers: apazos, jmolloy, weimingz

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15932

llvm-svn: 257188
2016-01-08 18:43:41 +00:00
Pirama Arumuga Nainar bf5ccdccb2 Do not ASSERTZEXT for i16 result of bitcast from f16 operand
Summary:
During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16.
Directly return an FP_TO_FP16 node with return type as the
promote-to-type of i16.

This patch also removes extraneous length check.  This legalization
should be valid even if integer and float types are of different
lengths.

This patch breaks a hard-float test for fp16 args.  The test is changed
to allow a vmov to zero-out the top bits, and also ensure that the
return value is in an FP register.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15438

llvm-svn: 257184
2016-01-08 17:46:05 +00:00
David Majnemer 2a6368f609 [WinEH] CatchHandler which don't have catch objects in StackColoring
StackColoring rewrites the frame indicies of operations involving
allocas if it can find that the life time of two objects do not overlap.
MSVC EH needs to be kept aware of this if happens in the event that a
catch object has moved around.  However, we represent the non-existance
of a catch object with a sentinel frame index (INT_MAX).  This sentinel
also happens to be the EmptyKey of the SlotRemap DenseMap.  Testing for
whether or not we need to translate the frame index fails in this case
because we call the count method on the DenseMap with the EmptyKey,
leading to assertions.  Instead, check if it is our sentinel value
before trying to look into the DenseMap.

This fixes PR26073.

llvm-svn: 257182
2016-01-08 17:24:47 +00:00
Tom Stellard 4c4c72db48 AMDGPU/SI: Emit global variable sizes when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15952

llvm-svn: 257173
2016-01-08 14:50:28 +00:00
Tom Stellard ad8f5e8111 AMDGPU: Emit functions sizes
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15951

llvm-svn: 257172
2016-01-08 14:50:23 +00:00
David Majnemer 086fec23ec [WinEH] Update WinEHFuncInfo if StackColoring merges allocas
Windows EH keeping track of which frame index corresponds to a catchpad
in order to inform the runtime where the catch parameter should be
initialized.  LLVM's optimizations are able to prove that the memory
used by the catch parameter can be reused with another memory
optimization, changing it's frame index.

We need to keep WinEHFuncInfo up to date with respect to this or we will
miscompile/assert.

This fixes PR26069.

llvm-svn: 257158
2016-01-08 08:03:55 +00:00
Craig Topper 04493fda81 [X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a mistake I made years ago.
llvm-svn: 257149
2016-01-08 06:09:18 +00:00
Kyle Butt bfcff3856a Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.

For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.

Also added a test for save/restore clobbered registers during calling __tls_get_addr.

Patch by Tim Shen

llvm-svn: 257137
2016-01-08 02:06:19 +00:00
Eric Christopher b793230797 Add some testing for thumb1 and thumb2 inline asm immediate constraints
and fix a couple of bugs on inspection.

Also fixes PR26061.

llvm-svn: 257122
2016-01-08 00:34:44 +00:00
JF Bastien b9ec4c6cea WebAssembly: use .skip instead of .zero directive
.zero is confusing when used with two arguments. Documentation:

  This directive emits SIZE 0-valued bytes.  SIZE must be an absolute
  expression.  This directive is actually an alias for the '.skip'
  directive so in can take an optional second argument of the value to
  store in the bytes instead of zero.  Using '.zero' in this way would be
  confusing however.

Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353

Hexagon and Sparc do the same, and it's all the same to WebAssembly so
let's pick the less confusing of the two.

llvm-svn: 257111
2016-01-07 23:18:29 +00:00
Keno Fischer ea33a25816 Temporarily revert r257105 "[Verifier] Check that debug values have proper size"
Looks like there's a case where clang generates debug info that triggers
the new verifier check. Reverting while investigating.

llvm-svn: 257107
2016-01-07 22:39:11 +00:00
Keno Fischer b3326be6ad [Verifier] Check that debug values have proper size
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.

One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.

Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
  it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
  variable as that would depend on the target, even though this test is
  supposed to be generic
- I had to manually declared size/align for reference type. See also the
  discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref

Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D14276

llvm-svn: 257105
2016-01-07 22:18:37 +00:00
Derek Schuff 9bfea27c26 [WebAssembly] Support combining GEP and FrameIndex offsets in memory operand offset field
Previously we only supported putting the FI into memory operand offset
fields if there was nothing there already. Now combine them.

Differential Revision: http://reviews.llvm.org/D15941

llvm-svn: 257084
2016-01-07 18:55:52 +00:00
Dan Gohman a4730cf0b4 [WebAssembly] Use the default private label prefixes.
The MC assembler doesn't like using the empty string as a private label
prefix because then it treats all labels as private. This commit reverts
back to the default prefix, which is .L, which is common in ELF targets
and consistent with the LLVM name mangler.

llvm-svn: 257083
2016-01-07 18:49:53 +00:00
Nicolai Haehnle 82fc962c20 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

llvm-svn: 257074
2016-01-07 17:10:29 +00:00
Nicolai Haehnle 3c05d6d3b5 AMDGPU/SI: xnack_mask is always reserved on VI
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.

I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).

This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.

It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)

Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15898

llvm-svn: 257073
2016-01-07 17:10:20 +00:00
Michael Zuckerman 5b1cad87aa [avx512] Fix test avx512bw-intrinsics.ll
Change the CHECK lablel into AVX512BW 
And fix declare lable of llvm.x86.avx512.mask.psrav32_hi 

llvm-svn: 257071
2016-01-07 16:25:42 +00:00
Michael Zuckerman 3aca221b31 [AVX512] add PSLLW and PSLLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15889

llvm-svn: 257070
2016-01-07 16:02:51 +00:00
Nico Weber 4324b9b236 Revert r257055, it caused PR26064.
llvm-svn: 257066
2016-01-07 15:01:46 +00:00
Michael Zuckerman 354152d590 [AVX512] add PSRAV Intrinsic
Differential Revision: http://reviews.llvm.org/D15856

llvm-svn: 257063
2016-01-07 14:42:20 +00:00
Michael Zuckerman a6df006b50 [AVX512] add PSHUFHW and PSHUFLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15925

llvm-svn: 257056
2016-01-07 12:35:43 +00:00
Simon Pilgrim bcc11a059e [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.

Follow up to D15310

llvm-svn: 257055
2016-01-07 11:34:27 +00:00
Simon Pilgrim 83e44c66ae [X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.

llvm-svn: 257046
2016-01-07 10:24:19 +00:00
Michael Zuckerman 4a1566827d [AVX512] add PSHUFD Intrinsic
Differential Revision: http://reviews.llvm.org/D15934

llvm-svn: 257044
2016-01-07 09:24:12 +00:00
Tim Northover bd41cf880c ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.

llvm-svn: 257039
2016-01-07 09:03:03 +00:00
NAKAMURA Takumi 7e887bd80d llvm/test/CodeGen/X86/statepoint-vector.ll REQUIRES asserts due to a debug option.
llvm-svn: 257031
2016-01-07 05:40:37 +00:00
Philip Reames cffc628ca1 One more attempt at stablizing a test on all platforms.
llvm-svn: 257026
2016-01-07 04:20:52 +00:00
Philip Reames afdbcc6a84 [Statepoints] Add test cases around vectors and stablize test
Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them.  Add a couple of tests which show this working.

Also, add a triple to the same test file to hopefully fix a failing bot.

It turns out we do han

llvm-svn: 257025
2016-01-07 04:15:31 +00:00
Haicheng Wu 08b9462540 [AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path
Allow fadd/fmul to be reassociated in aarch64.

llvm-svn: 257024
2016-01-07 04:01:02 +00:00
Philip Reames 3e2cf5320c [Statepoints] Initial support for relocating vectors of pointers
Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong.

The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate.

Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases. 

Differential Revision: http://reviews.llvm.org/D15632

llvm-svn: 257022
2016-01-07 03:32:11 +00:00
Dan Gohman 0c6f5ac50a [WebAssembly] Add -m:e to the target triple.
This enables ELF-style name mangling, which primarily means using ".L" for
private symbols.

llvm-svn: 257020
2016-01-07 03:19:23 +00:00
Quentin Colombet 9ed52e9a9e [ShrinkWrapping] Give up on irreducible CFGs.
We need to know whether or not a given basic block is in a loop for the analysis
to be correct.
Loop information may be incomplete on irreducible CFGs, therefore we may
generate incorrect code if we use it in those situations.

This fixes PR25988.

llvm-svn: 257012
2016-01-07 01:23:49 +00:00
Nicolai Haehnle a61e5a8d4e AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15902

llvm-svn: 256980
2016-01-06 22:01:04 +00:00
Quentin Colombet eb61e8e6b0 [X86] Correctly model TLS calls w.r.t. frame requirements.
TLS calls need the stack frame to be properly set up and this
implies that such calls need ADJUSTSTACK_xxx markers.

Fixes PR25820.

llvm-svn: 256959
2016-01-06 19:09:26 +00:00
Michael Kuperstein 037c9984db [ShrinkWrap] Fix FindIDom to only have one kind of failure.
FindIDom() can fail in two different ways - it can either return nullptr or the
block itself, depending on the circumstances. Some users of FindIDom() check
one error condition, while others check the other.

Change it to always return nullptr on failure.
This fixes PR26004.

Differential Revision: http://reviews.llvm.org/D15847

llvm-svn: 256955
2016-01-06 18:40:11 +00:00
Dan Gohman 8f59cf756f [WebAssembly] Don't use range-based loop for a list that's being modified
The first instruction in a block is what the rend() iterator points to, so
if it moves, we need to re-evaluate rend() so that we continue to iterate
through the rest of the instructions.

llvm-svn: 256953
2016-01-06 18:29:35 +00:00
Geoff Berry 12fe2279f3 ScheduleDAGInstrs: Bug fix for missed memory dependency.
Summary:
In buildSchedGraph(), when adding memory dependencies for loads, move
the call to adjustChainDeps() after the call to
addChainDependency(AliasChain) to handle the case where
addChainDependency(AliasChain) ends up not adding a dependency and
instead putting the SU on the RejectMemNodes list.  The call to
adjustChainDeps() must be done after the call to addChainDependency() in
order to process the SU added to the RejectMemNodes list to create
memory dependencies for it.

Reviewers: hfinkel, atrick, jonpa, resistor

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D15927

llvm-svn: 256950
2016-01-06 18:14:26 +00:00
Dan Gohman c04ccb66eb [WebAssembly] Add -asm-verbose=false to llc tests.
In general, disabling comments in the output reduces the chances of a
CHECK line accidentally matching a comment instead of its intended text.

llvm-svn: 256946
2016-01-06 16:45:05 +00:00
Artyom Skrobov 51f2d11be9 PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result
Reviewers: spatel, srking

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15331

llvm-svn: 256924
2016-01-06 09:41:10 +00:00
Simon Pilgrim 267163e713 [X86][SSE] There is no zmm addsubpd/addsubps instruction.
Replace the assert in combineShuffleToAddSub with an early out.

llvm-svn: 256922
2016-01-06 09:08:49 +00:00
Dan Gohman 797f639e79 [SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store offsets.
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.

This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.

Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.

Differential Revision: http://reviews.llvm.org/D15544

llvm-svn: 256890
2016-01-06 00:43:06 +00:00
Nicolai Haehnle 6035504ab3 AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.

Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.

Differential Revision: http://reviews.llvm.org/D15724

llvm-svn: 256870
2016-01-05 20:42:49 +00:00
Michael Zuckerman 5cbae95916 [AVX512] add PSLLD and PSLLQ Intrinsic
Differential Revision: http://reviews.llvm.org/D15885

llvm-svn: 256840
2016-01-05 15:17:39 +00:00
MinSeong Kim a7385ebf78 [AArch64] Add support for Samsung Exynos-M1
Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A).

Differential Revision: http://reviews.llvm.org/D15663

llvm-svn: 256828
2016-01-05 12:51:59 +00:00
Tom Stellard 5cd09ade38 AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA
Summary: This fixes a regression caused by r256282.

Reviewers: arsenm, cfang

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15736

llvm-svn: 256810
2016-01-05 03:40:16 +00:00
David Majnemer 869be0a4a6 Revert "[X86] Use push-pop for materializing small constants under 'minsize'"
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp.  In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop.  Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.

llvm-svn: 256808
2016-01-05 02:32:06 +00:00
Matthias Braun d9fe082ba7 X86: Add a testcase for PR25951
llvm-svn: 256801
2016-01-05 00:48:16 +00:00
Matthias Braun 7e762e4f9c MachineInstrBundle: Fix reversed isSuperRegisterEq() call
Unfortunately this fix had the effect of exposing the
-verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for
which I disabled it for now.
Two testcases also have additional pushq/popq where the corrected code
cannot prove that %rax is dead any longer. Looking at the examples, this
could potentially be fixed by improving computeRegisterLiveness() to check
the live-in lists of the successors blocks when reaching the end of a
block.

This fixes http://llvm.org/PR25951.

llvm-svn: 256799
2016-01-05 00:45:35 +00:00
Nicolai Haehnle 5b50497617 AMDGPU: add +xnack feature
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.

The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15869

llvm-svn: 256794
2016-01-04 23:35:53 +00:00
Simon Pilgrim e6955f3211 [X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type
llvm-svn: 256782
2016-01-04 21:41:11 +00:00
Geoff Berry 9e934b0cc2 [AArch64] Optimize some simple TBZ/TBNZ cases.
Summary:
Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases:

 (tbz (and x, m), b) -> (tbz x, b)
 (tbz (shl x, c), b) -> (tbz x, b-c)
 (tbz (shr x, c), b) -> (tbz x, b+c)
 (tbz (xor x, -1), b) -> (tbnz x, b)

Reviewers: jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15702

llvm-svn: 256765
2016-01-04 18:55:47 +00:00
Joseph Tremoulet 52f729a613 [WinEH] Update CoreCLR EH state numbering
Summary:
Fix the CLR state numbering to generate correct tables, and update the lit
test to verify them.

The CLR numbering assigns one state number to each catchpad and
cleanuppad.

It also computes two tree-like relations over states:
 1) Each state has a "HandlerParentState", which is the state of the next
    outer handler enclosing this state's handler (same as nearest ancestor
    per the ParentPad linkage on EH pads, but skipping over catchswitches).
 2) Each state has a "TryParentState", which:
    a) for a catchpad that's not the last handler on its catchswitch, is
       the state of the next catchpad on that catchswitch.
    b) for all other pads, is the state of the pad whose try region is the
       next outer try region enclosing this state's try region.  The "try
       regions are not present as such in the IR, but will be inferred
       based on the placement of invokes and pads which reach each other
       by exceptional exits.

Catchswitches do not get their own states, but each gets mapped to the
state of its first catchpad.

Table generation requires each state's "unwind dest" state to have a lower
state number than the given state.

Since HandlerParentState can be computed as a function of a pad's
ParentPad, and TryParentState can be computed as a function of its unwind
dest and the TryParentStates of its children, the CLR state numbering
algorithm first computes HandlerParentState in a top-down pass, then
computes TryParentState in a bottom-up pass.

Also reword some comments/names in the CLR EH table generation to make the
distinction between the different kinds of "parent" clear.


Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D15325

llvm-svn: 256760
2016-01-04 16:16:01 +00:00
Michael Zuckerman cf0b6db9ef [AVX512] add PSRAD and PSRAQ Intrinsic
Differential Revision: http://reviews.llvm.org/D15851

llvm-svn: 256754
2016-01-04 13:45:45 +00:00
Michael Zuckerman 000fca44a8 [AVX512] add PSRAW Intrinsic
Differential Revision: http://reviews.llvm.org/D15850

llvm-svn: 256751
2016-01-04 12:50:36 +00:00
Michael Zuckerman 068bc2f219 [AVX512] add PSRLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15838

llvm-svn: 256747
2016-01-04 11:39:06 +00:00
Simon Pilgrim 6e69cbe342 [X86][MMX] Regenerated vector insertion test.
Shows the true horror of what is going on....

llvm-svn: 256713
2016-01-03 19:17:37 +00:00
Simon Pilgrim 3ac854931f [X86][SSE] Added tests for insertion of zero elements into vectors
Many of these could be much better if we just lowered them all as shuffles - especially for the 256-bit vectors.

llvm-svn: 256708
2016-01-03 17:33:32 +00:00
Dimitry Andric 227b928abc Fix several accidental DOS line endings in source files
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings.  Convert these to native line endings.

There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.

Reviewers: joerg, aaron.ballman

Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15848

llvm-svn: 256707
2016-01-03 17:22:03 +00:00
Simon Pilgrim 569106fe99 [X86][SSE41] Added test cases for improving insertps shuffles
As mentioned on D14261, an upcoming patch will improve combines of insertps instructions. 

llvm-svn: 256706
2016-01-03 17:14:15 +00:00
Simon Pilgrim d17a1df783 [X86][SSE] Added v4f32 shuffle with zero tests
This is mainly test cases for improvements to insertps matching, but pre-SSE41 shuffles could be improved as well

llvm-svn: 256705
2016-01-03 17:02:56 +00:00
Joseph Tremoulet 71e5676de4 [WinEH] Update catchrets with cloned successors
Summary:
Add a pass to update catchrets when their successors get cloned; the
existing pass doesn't catch these because it walks the funclet whose
blocks are being cloned but the catchret is in a child funclet.

Also update the test for removing incoming PHI values; when the
predecessor is a catchret, the relevant color is the catchret's parentPad,
not its block's color.


Reviewers: andrew.w.kaylor, rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15840

llvm-svn: 256689
2016-01-02 15:22:36 +00:00
David Majnemer 011980cd50 [X86] Add intrinsics for reading and writing to the flags register
LLVM's targets need to know if stack pointer adjustments occur after the
prologue.  This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.

Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction.  There is an interesting corner case:
inline assembly.

The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate.  Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't.  The stack pointer is decremented and then
immediately incremented.  Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations.  This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.

Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.

llvm-svn: 256685
2016-01-01 06:50:01 +00:00
Michael Zuckerman 0dc468880d [AVX512] add PSRLQ and PSRLD Intrinsic
Differential Revision: http://reviews.llvm.org/D15770

llvm-svn: 256673
2015-12-31 15:22:04 +00:00
Michael Kuperstein d36e24a166 [X86] Avoid folding scalar loads into unary sse intrinsics
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.

Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()

Differential Revision: http://reviews.llvm.org/D15741

llvm-svn: 256671
2015-12-31 09:45:16 +00:00
Asaf Badouh af6569afd2 [X86][PKU] Add {RD,WR}PKRU intrinsics
Differential Revision: http://reviews.llvm.org/D15808

llvm-svn: 256670
2015-12-31 08:31:13 +00:00
Michael Zuckerman 80821ee77c [AVX512] add PSRLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15751

llvm-svn: 256558
2015-12-29 13:04:35 +00:00
Artyom Skrobov 2aca0c622a [Thumb] Fix assembler error 'cannot honor width suffix pop {lr}'
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+

Test cases by Ana Pazos

Differential Revision: http://reviews.llvm.org/D15707

llvm-svn: 256523
2015-12-28 21:40:45 +00:00
Sanjay Patel b3c53e512f [x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
http://reviews.llvm.org/rL256510

llvm-svn: 256522
2015-12-28 21:16:55 +00:00
Sanjay Patel 9da2b647c7 [x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454

llvm-svn: 256510
2015-12-28 19:20:19 +00:00
Sanjay Patel e0696ef613 Specify triple so 'make check' passes on darwin x86-64
The check lines were added with:
http://reviews.llvm.org/rL256458
http://reviews.llvm.org/rL256460

but on a darwin target, the output looks like:
  ## InlineAsm Start
  rorq  %rdi
  ## InlineAsm End
  ## InlineAsm Start
  rorq  %rsi
  ## InlineAsm End
  leaq  (%rsi,%rdi), %rax
  retq

llvm-svn: 256507
2015-12-28 18:28:44 +00:00
Michael Kuperstein 2ea81baf3a [X86] Better support for the MCU psABI (LLVM part)
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.

The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.

Differential Revision: http://reviews.llvm.org/D15054

llvm-svn: 256494
2015-12-28 14:39:21 +00:00
Asaf Badouh fba562004b [X86][AVX512] Lower broadcast sub vector to vector inrtrinsics
lower broadcast<type>x<vector> to shuffles.
 there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).



Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256490
2015-12-28 08:26:26 +00:00
Asaf Badouh 5546f51011 [X86][AVX512] add fp scalar broadcast intrinsics
Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256489
2015-12-28 08:09:25 +00:00
Igor Breger 756c289dd8 AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.

Differential Revision: http://reviews.llvm.org/D15675

llvm-svn: 256470
2015-12-27 13:56:16 +00:00
David Majnemer 2f2625c056 Make the test properly constrained
llvm-svn: 256460
2015-12-27 06:26:41 +00:00
David Majnemer 0b3661b7ce Try to passify buildbot
llvm-svn: 256458
2015-12-27 06:18:48 +00:00
David Majnemer 334676355a [X86, Win64] Use a frame pointer if pushf is emitted
A frame pointer must be used if stack pointer is modified after the
prologue.  LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.

There is a small twist: this sequence might exist in user code via
inline-assembly.  For now, conservatively assume that such functions
require a frame pointer.  For real world justification, please see
clang's implementation of __readeflags.

This fixes PR25945.

llvm-svn: 256456
2015-12-27 06:07:26 +00:00
David Majnemer 081e8fe4c0 [WinEH] Add comments explaining the EH tables
This is aids in debugging WinEH, similar functionality is present for
DWARF EH.

llvm-svn: 256455
2015-12-27 06:07:12 +00:00
Sanjay Patel bcff3f7d92 [x86] lower calls to llvm.maxnum.v4f32 using maxps
This is a follow-on to:
http://reviews.llvm.org/rL255700

llvm-svn: 256454
2015-12-26 21:44:55 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Craig Topper afefb1f13b Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct."
llvm-svn: 256435
2015-12-26 04:58:05 +00:00
Craig Topper ff04fb0487 Revert r256432 "Test"
This is the test case for r256433, but it got committed incorrectly in my local repo.

llvm-svn: 256434
2015-12-26 04:56:51 +00:00
Craig Topper 23561f8215 Test
llvm-svn: 256432
2015-12-26 04:50:01 +00:00
Dan Gohman 8887d1faed [WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.

This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.

llvm-svn: 256402
2015-12-25 00:31:02 +00:00
Elena Demikhovsky 9e225a2f52 AVX-512: Kreg set 0/1 optimization
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.

KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.

Differential Revision: http://reviews.llvm.org/D15739

llvm-svn: 256365
2015-12-24 08:12:22 +00:00
Igor Breger 268f6f53c5 AVX512: VPMOVM2B/W/D/Q intrinsic implementation.
Differential Revision: http://reviews.llvm.org//D15747

llvm-svn: 256364
2015-12-24 07:11:53 +00:00
JF Bastien 3e9f10ad3d WebAssembly: remove 'external' from test
Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new.

Reviewers: sunfish, kripken

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D15753

llvm-svn: 256353
2015-12-23 23:56:13 +00:00
Philip Reames cb0f947a2a [Statepoints] Use Indirect operands for spill slots
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.

To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).

Differential Revision: http://reviews.llvm.org/D15759

llvm-svn: 256352
2015-12-23 23:44:28 +00:00
Simon Pilgrim 17377bdd45 [X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.

As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.

Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.

Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).

Differential Revision: http://reviews.llvm.org/D15477

llvm-svn: 256332
2015-12-23 13:10:07 +00:00
Igor Breger 7b46b4e798 AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts.
Differential Revision: http://reviews.llvm.org/D15713

llvm-svn: 256324
2015-12-23 08:06:50 +00:00
David Majnemer c640f863e0 [WinEH] Don't visit the same catchswitch twice
We visited the same catchswitch twice because it was both the child of
another funclet and the predecessor of a cleanuppad.

Instead, change the numbering algorithm to only recurse if the unwind
destination of the inner funclet agrees with the unwind destination of
the catchswitch.

This fixes PR25926.

llvm-svn: 256317
2015-12-23 03:59:04 +00:00
Changpeng Fang b41574a961 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

  NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256282
2015-12-22 20:55:23 +00:00
Rafael Espindola 4b0d24c00a Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"
This reverts commit r256273.

It broke CodeGen/AMDGPU/llvm.dbg.value.ll

llvm-svn: 256275
2015-12-22 19:46:44 +00:00
Changpeng Fang 9b8a9be058 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256273
2015-12-22 19:32:28 +00:00
Jun Bum Lim 6755c3bc5f [AArch64] Promote loads from stored
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.

Original commit message:

This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256249
2015-12-22 16:36:16 +00:00
Asaf Badouh 13ffa4bf7c [X86][AVX512] Add rcp14 and rsqrt14 intrinsics
Differential Revision: http://reviews.llvm.org/D15414

llvm-svn: 256237
2015-12-22 11:40:04 +00:00
Keno Fischer 4eccf11373 [ASMPrinter] Fix missing handling of DW_OP_bit_piece
In r256077, I added printing for DIExpressions in DEBUG_VALUE comments,
but neglected to handle DW_OP_bit_piece operands. Thanks to
Mikael Holmen and Joerg Sonnenberger for spotting this.

llvm-svn: 256236
2015-12-22 07:14:50 +00:00
David Majnemer ff1d084aa2 [MC] Don't use the architecture to govern which object file format to use
InitMCObjectFileInfo was trying to override the triple in awkward ways.
For example, a triple specifying COFF but not Windows was forced as ELF.
This makes it easy for internal invariants to get violated, such as
those which triggered PR25912.

This fixes PR25912.

llvm-svn: 256226
2015-12-22 01:39:04 +00:00
Jun Bum Lim a23e5f7516 Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
This is recommit of r256028 with minor fixes in unittests:
  CodeGen/Mips/eh.ll
  CodeGen/Mips/insn-zero-size-bb.ll

Original commit message:

When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256202
2015-12-21 22:00:51 +00:00
Cong Hou 8df93ce455 [X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.

Cost table is updated accordingly.


Differential revision: http://reviews.llvm.org/D14588

llvm-svn: 256194
2015-12-21 20:42:43 +00:00
Adrian Prantl 6b44541015 Convert the CodeGen/ARM/sched-it-debug-nodes.ll testcase from IR -> MIR.
NFC
PR24563

llvm-svn: 256187
2015-12-21 19:44:42 +00:00
Adrian Prantl 5d9acc2443 Teach ARMLoadStoreOptimizer to ignore DBG_VALUE instructions when merging
instructions.

As noted in PR24563.
rdar://problem/23963293

llvm-svn: 256183
2015-12-21 19:25:03 +00:00
Matthew Simpson 11c4de6054 [AArch64] Add additional extract-extend patterns for smov
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.

This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.

llvm-svn: 256176
2015-12-21 18:31:25 +00:00
Jun Bum Lim 4bb171c8da Revert "[AArch64] Promote loads from stores"
This reverts commit r256004 due to a failure in cortex-a53.

llvm-svn: 256160
2015-12-21 15:36:49 +00:00
Chad Rosier d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Igor Breger 44b60a3687 AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored.
llvm-svn: 256157
2015-12-21 14:40:36 +00:00
Amjad Aboud 60b5e1b6c0 Implemented Support of IA interrupt and exception handlers:
http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html

Differential Revision: http://reviews.llvm.org/D15567

llvm-svn: 256155
2015-12-21 14:07:14 +00:00
Craig Topper 074e845260 [X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift.
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.

Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.

llvm-svn: 256126
2015-12-20 18:41:54 +00:00
Weiming Zhao 613c6862fa Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.

r250697: http://reviews.llvm.org/rL250697

Reviewers: rmaprath

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15653

llvm-svn: 256115
2015-12-20 06:41:44 +00:00
JF Bastien 374ea4bda5 WebAssembly: add vtable test
The test will mainly be useful to check that the .s file assembles and relocates properly because vtables reference functions in their data section.

llvm-svn: 256102
2015-12-19 18:55:18 +00:00
Keno Fischer f7346e0a6c Hopefully fix debug-info-blocks.ll test on win32 bot
llc_dwarf adds an mtriple, which forces this to use COFF, causing
the test to fail. Hopefully using regular llc without the triple
will work fine everywhere

llvm-svn: 256084
2015-12-19 03:32:23 +00:00
Keno Fischer 00cbf9a69a Clean up the processing of dbg.value in various places
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.

Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.

Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.

Reviewers: aprantl

Subscribers: dsanders

Differential Revision: http://reviews.llvm.org/D14186

llvm-svn: 256077
2015-12-19 02:02:44 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Matt Arsenault 10a509292c Fix broken type legalization of min/max
This was using an anyext when promoting the type
when zext/sext is required.

llvm-svn: 256074
2015-12-19 01:39:48 +00:00
Nicolai Haehnle dd58705af6 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Rafael Espindola 708a91a103 Revert "Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst"
This reverts commit r256028.

It broke:

    LLVM :: CodeGen/Mips/eh.ll
    LLVM :: CodeGen/Mips/insn-zero-size-bb.ll

llvm-svn: 256032
2015-12-18 21:23:32 +00:00
Jun Bum Lim 51a247065e Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256028
2015-12-18 20:53:47 +00:00
Krzysztof Parzyszek 21dc8bdd9e [Hexagon] Add PIC support
llvm-svn: 256025
2015-12-18 20:19:30 +00:00
Jun Bum Lim 3509d64c24 [AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256004
2015-12-18 18:08:30 +00:00
Hans Wennborg a6a2e512cf [X86] Use push-pop for materializing small constants under 'minsize'
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.

This is a follow-up to r255656.

Differential Revision: http://reviews.llvm.org/D15549

llvm-svn: 255936
2015-12-17 23:18:39 +00:00
Matthew Simpson 13dddb0799 Revert "[AArch64] Add DAG combine for extract extend pattern"
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.

llvm-svn: 255928
2015-12-17 21:29:47 +00:00
Dan Gohman 670a60ed52 [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.
llvm-svn: 255925
2015-12-17 20:50:45 +00:00
Tom Stellard caaa3aa07c AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15583

Patch by: Changpeng Fang

llvm-svn: 255908
2015-12-17 17:05:09 +00:00
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Alexey Bataev 7b72b658cc [X86] Add option for enabling LEA optimization pass, by Andrey Turetsky
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573

llvm-svn: 255881
2015-12-17 07:34:39 +00:00
Cong Hou b9e8d483b5 Fix PR25838.
This is a quick fix to PR25838. The issue comes from the restriction that we
cannot normalize probabilities containing both known and unknown ones. A patch
that removes this restriction is under the review now:

http://reviews.llvm.org/D15548

llvm-svn: 255867
2015-12-17 01:29:08 +00:00
Dan Gohman 4172953813 [WebAssembly] Fix legalization of shift operators on large integer types.
llvm-svn: 255847
2015-12-16 23:25:51 +00:00
Derek Schuff 8bb5f2927a [WebAssembly] Implement eliminateCallFramePseudo
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).

Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.

Differential Revision: http://reviews.llvm.org/D15587

llvm-svn: 255844
2015-12-16 23:21:30 +00:00
Eric Christopher bfba572425 Fix funciton->function typo.
llvm-svn: 255841
2015-12-16 23:10:53 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Derek Schuff 45cd5a79b2 [WebAssembly] Print an extra local decl when the user stack pointer is used
Differential Revision: http://reviews.llvm.org/D15546

llvm-svn: 255815
2015-12-16 20:43:06 +00:00
Dan Gohman b3aa1ecab0 [WebAssembly] Fix the CFG Stackifier to handle unoptimized branches
If a branch both branches to and falls through to the same block, treat it as
an explicit branch.

llvm-svn: 255803
2015-12-16 19:06:41 +00:00
Dan Gohman e2831b4e27 [WebAssembly] Use the new offset syntax for memory operands in inline asm.
llvm-svn: 255788
2015-12-16 18:14:49 +00:00
Ulrich Weigand 47f3649374 [SystemZ] Fix assertion failure in adjustSubwordCmp
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used.  This is
why the code in adjustSubwordCmp had this assertion:

    assert(C.ICmpType == SystemZICMP::Any &&
           "Signedness shouldn't matter here.");

assuming the the caller had already detected that fact.  However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.

Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.

llvm-svn: 255786
2015-12-16 18:04:06 +00:00
Tobias Edler von Koch b51460cf86 [Hexagon] Make memcpy lowering thread-safe
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.

llvm-svn: 255785
2015-12-16 17:29:37 +00:00
Dan Gohman 30a42bf585 [WebAssembly] Support more kinds of inline asm operands
llvm-svn: 255782
2015-12-16 17:15:17 +00:00
Michael Kuperstein e75e6e2a23 [X86] Improve shift combining
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into       (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into    (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])

sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.

This fixes PR24373.

Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161

llvm-svn: 255761
2015-12-16 11:22:37 +00:00
Chen Li e2222fdf95 Remove FileCheck from test case token_landingpad.ll.
The test case only needs to make sure it does not crash LLVM.

llvm-svn: 255755
2015-12-16 06:27:09 +00:00
Chen Li 3e0da9de77 Fixed test case in rL255749: [SelectionDAGBuilder] Adds support for landingpads of token type.
llvm-svn: 255750
2015-12-16 05:05:18 +00:00
Chen Li 3e8330a1fe [SelectionDAGBuilder] Adds support for landingpads of token type
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.

Reviewers: JosephTremoulet, majnemer, rnk

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15405

llvm-svn: 255749
2015-12-16 04:48:42 +00:00
Philip Reames 61a24ab6cc [IR] Add support for floating pointer atomic loads and stores
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Differential Revision: http://reviews.llvm.org/D15471

llvm-svn: 255737
2015-12-16 00:49:36 +00:00
Reid Kleckner 7850c9f5ca [WinEH] Make llvm.x86.seh.recoverfp work on x64
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.

Fixes SEH filter captures, which were broken in r250088.

Issue reported by Alex Crichton.

llvm-svn: 255707
2015-12-15 23:40:58 +00:00
Tom Stellard 7750f4ed9e AMDGPU/SI: Set the code object work group segment size when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15493

llvm-svn: 255702
2015-12-15 23:15:25 +00:00
Sanjay Patel 271efcdf20 [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend 
already has in PerformSELECTCombine().

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the 
vector.

Differential Revision: http://reviews.llvm.org/D15294

llvm-svn: 255700
2015-12-15 23:11:43 +00:00