Commit Graph

33992 Commits

Author SHA1 Message Date
Simon Pilgrim 3ac854931f [X86][SSE] Added tests for insertion of zero elements into vectors
Many of these could be much better if we just lowered them all as shuffles - especially for the 256-bit vectors.

llvm-svn: 256708
2016-01-03 17:33:32 +00:00
Dimitry Andric 227b928abc Fix several accidental DOS line endings in source files
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings.  Convert these to native line endings.

There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.

Reviewers: joerg, aaron.ballman

Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15848

llvm-svn: 256707
2016-01-03 17:22:03 +00:00
Simon Pilgrim 569106fe99 [X86][SSE41] Added test cases for improving insertps shuffles
As mentioned on D14261, an upcoming patch will improve combines of insertps instructions. 

llvm-svn: 256706
2016-01-03 17:14:15 +00:00
Simon Pilgrim d17a1df783 [X86][SSE] Added v4f32 shuffle with zero tests
This is mainly test cases for improvements to insertps matching, but pre-SSE41 shuffles could be improved as well

llvm-svn: 256705
2016-01-03 17:02:56 +00:00
Joseph Tremoulet 131a462690 [WinEH] Verify catchswitch handlers
Summary:
The handler list must be nonempty and consist solely of CatchPads.


Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15842

llvm-svn: 256691
2016-01-02 15:25:25 +00:00
Joseph Tremoulet 06125e52a7 [WinEH] Tighten parentPad verifier checks
Summary: A catchswitch cannot be a parent of a cleanuppad or another catchswitch.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15841

llvm-svn: 256690
2016-01-02 15:24:24 +00:00
Joseph Tremoulet 71e5676de4 [WinEH] Update catchrets with cloned successors
Summary:
Add a pass to update catchrets when their successors get cloned; the
existing pass doesn't catch these because it walks the funclet whose
blocks are being cloned but the catchret is in a child funclet.

Also update the test for removing incoming PHI values; when the
predecessor is a catchret, the relevant color is the catchret's parentPad,
not its block's color.


Reviewers: andrew.w.kaylor, rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15840

llvm-svn: 256689
2016-01-02 15:22:36 +00:00
David Majnemer 011980cd50 [X86] Add intrinsics for reading and writing to the flags register
LLVM's targets need to know if stack pointer adjustments occur after the
prologue.  This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.

Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction.  There is an interesting corner case:
inline assembly.

The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate.  Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't.  The stack pointer is decremented and then
immediately incremented.  Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations.  This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.

Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.

llvm-svn: 256685
2016-01-01 06:50:01 +00:00
Sanjay Patel bee05caa6b [LibCallSimplifier] propagate FMF when shrinking binary calls
llvm-svn: 256682
2015-12-31 23:40:59 +00:00
Sanjay Patel aa23114cb4 [LibCallSimplifier] propagate FMF when shrinking unary calls
llvm-svn: 256679
2015-12-31 21:52:31 +00:00
Sanjay Patel f6f32bcaa4 change function names to avoid accidentally matching the substring
llvm-svn: 256678
2015-12-31 21:25:25 +00:00
Sanjay Patel 4e8b300400 add 'fast' attribute to calls to show that the flag isn't being propagated
llvm-svn: 256677
2015-12-31 21:12:19 +00:00
Michael Zuckerman 0dc468880d [AVX512] add PSRLQ and PSRLD Intrinsic
Differential Revision: http://reviews.llvm.org/D15770

llvm-svn: 256673
2015-12-31 15:22:04 +00:00
Michael Kuperstein d36e24a166 [X86] Avoid folding scalar loads into unary sse intrinsics
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.

Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()

Differential Revision: http://reviews.llvm.org/D15741

llvm-svn: 256671
2015-12-31 09:45:16 +00:00
Asaf Badouh af6569afd2 [X86][PKU] Add {RD,WR}PKRU intrinsics
Differential Revision: http://reviews.llvm.org/D15808

llvm-svn: 256670
2015-12-31 08:31:13 +00:00
Sanjay Patel 41160c2094 [ValueTracking] fix bug computing isKnownToBeAPowerOfTwo() with arithmetic shift right (PR25900)
This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=25900

If we think that an arithmetic right shift of a power of two is always a power of two, 
an sdiv gets wrongly converted to udiv.

Differential Revision: http://reviews.llvm.org/D15827

llvm-svn: 256655
2015-12-30 22:40:52 +00:00
Geoff Berry 43dc285915 [JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost()
The code that was meant to adjust the duplication cost based on the
terminator opcode was not being executed in cases where the initial
threshold was hit inside the loop.

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D15536

llvm-svn: 256568
2015-12-29 18:10:16 +00:00
Michael Zuckerman 80821ee77c [AVX512] add PSRLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15751

llvm-svn: 256558
2015-12-29 13:04:35 +00:00
James Y Knight 992904a0af Fix gold test after r256465.
That commit added a new pass, and this test is sensitive to what the
first pass after verify is called.

llvm-svn: 256532
2015-12-29 03:48:37 +00:00
Eric Christopher 2ae7180b24 Accept dwarf version 5 for CIE versions.
llvm-svn: 256527
2015-12-28 23:02:42 +00:00
Artyom Skrobov 2aca0c622a [Thumb] Fix assembler error 'cannot honor width suffix pop {lr}'
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+

Test cases by Ana Pazos

Differential Revision: http://reviews.llvm.org/D15707

llvm-svn: 256523
2015-12-28 21:40:45 +00:00
Sanjay Patel b3c53e512f [x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
http://reviews.llvm.org/rL256510

llvm-svn: 256522
2015-12-28 21:16:55 +00:00
Manuel Jacob 9db5b93ffc [RS4GC] Fix rematerialization of bitcast of bitcast.
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint.  See the
test case for an example.

Reviewers: igor-laevsky, reames

Subscribers: reames, alex, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15789

llvm-svn: 256520
2015-12-28 20:14:05 +00:00
Elena Demikhovsky 5494698828 Implemented cost model for masked gather and scatter operations
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.

Differential revision: http://reviews.llvm.org/D15677

llvm-svn: 256519
2015-12-28 20:10:59 +00:00
Sanjay Patel 9da2b647c7 [x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454

llvm-svn: 256510
2015-12-28 19:20:19 +00:00
Sanjay Patel e0696ef613 Specify triple so 'make check' passes on darwin x86-64
The check lines were added with:
http://reviews.llvm.org/rL256458
http://reviews.llvm.org/rL256460

but on a darwin target, the output looks like:
  ## InlineAsm Start
  rorq  %rdi
  ## InlineAsm End
  ## InlineAsm Start
  rorq  %rsi
  ## InlineAsm End
  leaq  (%rsi,%rdi), %rax
  retq

llvm-svn: 256507
2015-12-28 18:28:44 +00:00
Roman Divacky 73fc84761f Support clrex instruction on ARMv6k. Patch by Andrew Turner.
llvm-svn: 256505
2015-12-28 17:47:23 +00:00
Michael Kuperstein 2ea81baf3a [X86] Better support for the MCU psABI (LLVM part)
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.

The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.

Differential Revision: http://reviews.llvm.org/D15054

llvm-svn: 256494
2015-12-28 14:39:21 +00:00
Asaf Badouh fba562004b [X86][AVX512] Lower broadcast sub vector to vector inrtrinsics
lower broadcast<type>x<vector> to shuffles.
 there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).



Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256490
2015-12-28 08:26:26 +00:00
Asaf Badouh 5546f51011 [X86][AVX512] add fp scalar broadcast intrinsics
Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256489
2015-12-28 08:09:25 +00:00
Craig Topper c648c9b92d [AVX512] Bring vmovq instructions names into alignment with the AVX and SSE names. Add a missing encoding to disassembler and assembler.
I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there.

llvm-svn: 256483
2015-12-28 06:11:42 +00:00
Igor Breger 756c289dd8 AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.

Differential Revision: http://reviews.llvm.org/D15675

llvm-svn: 256470
2015-12-27 13:56:16 +00:00
Chandler Carruth 3a040e6d47 [attrs] Extract the pure inference of function attributes into
a standalone pass.

There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.

In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.

The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.

Differential Revision: http://reviews.llvm.org/D15676

llvm-svn: 256466
2015-12-27 08:41:34 +00:00
Chandler Carruth f49f1a87ef [attrs] Split off the forced attributes utility into its own pass that
is (by default) run much earlier than FuncitonAttrs proper.

This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.

I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.

Differential Revision: http://reviews.llvm.org/D15668

llvm-svn: 256465
2015-12-27 08:13:45 +00:00
David Majnemer 2f2625c056 Make the test properly constrained
llvm-svn: 256460
2015-12-27 06:26:41 +00:00
David Majnemer 0b3661b7ce Try to passify buildbot
llvm-svn: 256458
2015-12-27 06:18:48 +00:00
NAKAMURA Takumi c86e9b79b8 Prune the feature "tls". No one is using it since TLS is enabled for Cygwin.
llvm-svn: 256457
2015-12-27 06:14:33 +00:00
David Majnemer 334676355a [X86, Win64] Use a frame pointer if pushf is emitted
A frame pointer must be used if stack pointer is modified after the
prologue.  LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.

There is a small twist: this sequence might exist in user code via
inline-assembly.  For now, conservatively assume that such functions
require a frame pointer.  For real world justification, please see
clang's implementation of __readeflags.

This fixes PR25945.

llvm-svn: 256456
2015-12-27 06:07:26 +00:00
David Majnemer 081e8fe4c0 [WinEH] Add comments explaining the EH tables
This is aids in debugging WinEH, similar functionality is present for
DWARF EH.

llvm-svn: 256455
2015-12-27 06:07:12 +00:00
Sanjay Patel bcff3f7d92 [x86] lower calls to llvm.maxnum.v4f32 using maxps
This is a follow-on to:
http://reviews.llvm.org/rL255700

llvm-svn: 256454
2015-12-26 21:44:55 +00:00
Benjamin Kramer 13dfb7df32 Fix safepoint intrinsic signatures in test.
Should bring back the bots after r256443.

llvm-svn: 256450
2015-12-26 11:40:48 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Craig Topper afefb1f13b Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct."
llvm-svn: 256435
2015-12-26 04:58:05 +00:00
Craig Topper ff04fb0487 Revert r256432 "Test"
This is the test case for r256433, but it got committed incorrectly in my local repo.

llvm-svn: 256434
2015-12-26 04:56:51 +00:00
Craig Topper 23561f8215 Test
llvm-svn: 256432
2015-12-26 04:50:01 +00:00
Dan Gohman 8887d1faed [WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.

This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.

llvm-svn: 256402
2015-12-25 00:31:02 +00:00
Sanjay Patel ae945e7927 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
Asaf Badouh 9a5a83a518 [X86][PKU] Add {RD,WR}PKRU encoding
Differential Revision: http://reviews.llvm.org/D15711

llvm-svn: 256366
2015-12-24 08:25:00 +00:00
Elena Demikhovsky 9e225a2f52 AVX-512: Kreg set 0/1 optimization
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.

KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.

Differential Revision: http://reviews.llvm.org/D15739

llvm-svn: 256365
2015-12-24 08:12:22 +00:00
Igor Breger 268f6f53c5 AVX512: VPMOVM2B/W/D/Q intrinsic implementation.
Differential Revision: http://reviews.llvm.org//D15747

llvm-svn: 256364
2015-12-24 07:11:53 +00:00
Tom Stellard 5ebdfbe562 AMDGPU/SI: Fix encoding of flat instructions on VI
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15735

llvm-svn: 256360
2015-12-24 03:18:18 +00:00
JF Bastien 3e9f10ad3d WebAssembly: remove 'external' from test
Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new.

Reviewers: sunfish, kripken

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D15753

llvm-svn: 256353
2015-12-23 23:56:13 +00:00
Philip Reames cb0f947a2a [Statepoints] Use Indirect operands for spill slots
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.

To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).

Differential Revision: http://reviews.llvm.org/D15759

llvm-svn: 256352
2015-12-23 23:44:28 +00:00
Adrian Prantl 8e7d3b9402 llvm-dwarfdump: Add support for dumping .dSYM bundles.
This replicates the logic of Darwin dwarfdump for manually opening up
.dSYM bundles without introducing any new dependencies.
<rdar://problem/20491670>

llvm-svn: 256350
2015-12-23 21:51:13 +00:00
Simon Pilgrim 17377bdd45 [X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.

As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.

Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.

Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).

Differential Revision: http://reviews.llvm.org/D15477

llvm-svn: 256332
2015-12-23 13:10:07 +00:00
David Majnemer 2bc2538470 [OperandBundles] Have GlobalsModRef play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

llvm-svn: 256329
2015-12-23 09:58:46 +00:00
David Majnemer 63ad9e0543 [OperandBundles] Have TailCallElim play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

This fixes PR25928.

llvm-svn: 256328
2015-12-23 09:58:43 +00:00
David Majnemer 02f4787e45 [OperandBundles] Have InstCombine play nice with operand bundles
Don't assume a call's use corresponds to an argument operand, it might
correspond to a bundle operand.

llvm-svn: 256327
2015-12-23 09:58:41 +00:00
David Majnemer 464be3724a [OperandBundles] Have DeadArgElim play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

llvm-svn: 256326
2015-12-23 09:58:36 +00:00
Igor Breger 7b46b4e798 AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts.
Differential Revision: http://reviews.llvm.org/D15713

llvm-svn: 256324
2015-12-23 08:06:50 +00:00
David Majnemer c640f863e0 [WinEH] Don't visit the same catchswitch twice
We visited the same catchswitch twice because it was both the child of
another funclet and the predecessor of a cleanuppad.

Instead, change the numbering algorithm to only recurse if the unwind
destination of the inner funclet agrees with the unwind destination of
the catchswitch.

This fixes PR25926.

llvm-svn: 256317
2015-12-23 03:59:04 +00:00
Paul Robinson 22d0d31a72 Form reform for MCDwarf.
MCDwarf emits a canned abbreviation table, but was not emitting proper
forms for DWARF version 4, which is the default after r249655.

Differential Revision: http://reviews.llvm.org/D15732

llvm-svn: 256313
2015-12-23 01:57:31 +00:00
Manuel Jacob a4efd8ac2e [RS4GC] Fix base pair printing for constants.
Previously, "%" + name of the value was printed for each derived and base
pointer.  This is correct for instructions, but wrong for e.g. globals.

llvm-svn: 256305
2015-12-23 00:19:45 +00:00
Changpeng Fang b41574a961 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

  NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256282
2015-12-22 20:55:23 +00:00
Rafael Espindola 10d9a033db Also add unnamed_addr to functions.
llvm-svn: 256281
2015-12-22 20:43:30 +00:00
Rafael Espindola 5349d87a69 Delete dead GlobalAliases.
llvm-svn: 256276
2015-12-22 19:50:22 +00:00
Rafael Espindola 4b0d24c00a Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"
This reverts commit r256273.

It broke CodeGen/AMDGPU/llvm.dbg.value.ll

llvm-svn: 256275
2015-12-22 19:46:44 +00:00
Changpeng Fang 9b8a9be058 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256273
2015-12-22 19:32:28 +00:00
Cong Hou e93b8e1539 [BPI] Replace weights by probabilities in BPI.
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.


Differential revision: http://reviews.llvm.org/D15519 

llvm-svn: 256263
2015-12-22 18:56:14 +00:00
Manuel Jacob 4e4f60ded0 Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.

Reviewers: sanjoy, reames

Subscribers: sanjoy, chenli, llvm-commits

Differential Revision: http://reviews.llvm.org/D15719

llvm-svn: 256262
2015-12-22 18:44:45 +00:00
Manuel Jacob 990dfa6fe5 [RS4GC] Fix crash in the case that a live variable has a constant base.
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.

This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base.  This would be unnecessary
anyway because anything with a constant base won't move.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15556

llvm-svn: 256252
2015-12-22 16:50:44 +00:00
Jun Bum Lim 6755c3bc5f [AArch64] Promote loads from stored
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.

Original commit message:

This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256249
2015-12-22 16:36:16 +00:00
Asaf Badouh 13ffa4bf7c [X86][AVX512] Add rcp14 and rsqrt14 intrinsics
Differential Revision: http://reviews.llvm.org/D15414

llvm-svn: 256237
2015-12-22 11:40:04 +00:00
Keno Fischer 4eccf11373 [ASMPrinter] Fix missing handling of DW_OP_bit_piece
In r256077, I added printing for DIExpressions in DEBUG_VALUE comments,
but neglected to handle DW_OP_bit_piece operands. Thanks to
Mikael Holmen and Joerg Sonnenberger for spotting this.

llvm-svn: 256236
2015-12-22 07:14:50 +00:00
David Majnemer ff1d084aa2 [MC] Don't use the architecture to govern which object file format to use
InitMCObjectFileInfo was trying to override the triple in awkward ways.
For example, a triple specifying COFF but not Windows was forced as ELF.
This makes it easy for internal invariants to get violated, such as
those which triggered PR25912.

This fixes PR25912.

llvm-svn: 256226
2015-12-22 01:39:04 +00:00
Kostya Serebryany a9f3bc6d86 Partial fix for PR25912, see comment 13. Should fix the sanitizer bootstrap bot
llvm-svn: 256225
2015-12-22 01:18:49 +00:00
Teresa Johnson d213aa469e Handle empty Subprogram list when linking metadata.
Use an iterator that handles an empty subprogram list.

Fixes PR25915.

llvm-svn: 256224
2015-12-22 01:17:19 +00:00
Easwaran Raman bdb6f1dcc3 Determine callee's hotness and adjust threshold based on that. NFC.
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.

Differential Revision: http://reviews.llvm.org/D15245

llvm-svn: 256222
2015-12-22 00:32:35 +00:00
Evgeniy Stepanov 8827f2db85 [safestack] Add option for non-TLS unsafe stack pointer.
This patch adds an option, -safe-stack-no-tls, for using normal
storage instead of thread-local storage for the unsafe stack pointer.
This can be useful when SafeStack is applied to an operating system
kernel.

http://reviews.llvm.org/D15673

Patch by Michael LeMay.

llvm-svn: 256221
2015-12-22 00:13:11 +00:00
Xinliang David Li 5fe0455563 [PGO] Fix another comdat related issue for COFF
The linker requires that a comdat section must be associated
with a another comdat section that precedes it. This
means the comdat section's name needs to use the  profile name
var's name.

Patch tested by Johan Engelen.

llvm-svn: 256220
2015-12-22 00:11:15 +00:00
Xinliang David Li 14a97c26c8 Fix test case comment (NFC)
llvm-svn: 256206
2015-12-21 22:26:49 +00:00
Evgeniy Stepanov fda72c52a2 [cfi] Fix LowerBitSets on 32-bit targets.
This code attempts to truncate IntPtrTy to i32, which may be the same
type.

llvm-svn: 256205
2015-12-21 22:14:04 +00:00
David Majnemer 03e2cc3007 [MC, COFF] Support link /incremental conditionally
Today, we always take into account the possibility that object files
produced by MC may be consumed by an incremental linker.  This results
in us initialing fields which vary with time (TimeDateStamp) which harms
hermetic builds (e.g. verifying a self-host went well) and produces
sub-optimal code because we cannot assume anything about the relative
position of functions within a section (call sites can get redirected
through incremental linker thunks).

Let's provide an MCTargetOption which controls this behavior so that we
can disable this functionality if we know a-priori that the build will
not rely on /incremental.

llvm-svn: 256203
2015-12-21 22:09:27 +00:00
Jun Bum Lim a23e5f7516 Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
This is recommit of r256028 with minor fixes in unittests:
  CodeGen/Mips/eh.ll
  CodeGen/Mips/insn-zero-size-bb.ll

Original commit message:

When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256202
2015-12-21 22:00:51 +00:00
Xinliang David Li ab361efee7 Resubmit r256193 with test fix: assertion failure analyzed
llvm-svn: 256201
2015-12-21 21:52:27 +00:00
Xinliang David Li 13da1f149e Revert r256193: build bot failure triggered
llvm-svn: 256198
2015-12-21 21:00:33 +00:00
Cong Hou 8df93ce455 [X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.

Cost table is updated accordingly.


Differential revision: http://reviews.llvm.org/D14588

llvm-svn: 256194
2015-12-21 20:42:43 +00:00
Xinliang David Li 6c494cd0df [PGO] Fix profile var comdat generation problem with COFF
When targeting COFF, it is required that a comdat section to
have a global obj with the same name as the comdat (except for
comdats with select kind to be associative). This fix makes
sure that the comdat is keyed on the data variable for COFF.

Also improved test coverage for this.

llvm-svn: 256193
2015-12-21 20:41:20 +00:00
Michael Zolotukhin 0c97988e54 [ValueTracking] Properly handle non-sized types in isAligned function.
Reviewers: apilipenko, reames, sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15597

llvm-svn: 256192
2015-12-21 20:38:18 +00:00
Adrian Prantl ce8581389b Fix PR24563 (LiveDebugVariables unconditionally propagates all DBG_VALUEs)
LiveDebugVariables unconditionally propagates all DBG_VALUE down the
dominator tree, which happens to work fine if there already is another
DBG_VALUE or the DBG_VALUE happends to describe a single-assignment vreg
but is otherwise wrong if the DBG_VALUE is coming from only one of the
predecessors.

In r255759 we introduced a proper data flow analysis scheduled after
LiveDebugVariables that correctly propagates DBG_VALUEs across basic block
boundaries. With the new pass in place, the incorrect propagation in
LiveDebugVariables can be retired witout loosing any of the benefits
where LiveDebugVariables happened to do the right thing.

llvm-svn: 256188
2015-12-21 20:03:00 +00:00
Adrian Prantl 6b44541015 Convert the CodeGen/ARM/sched-it-debug-nodes.ll testcase from IR -> MIR.
NFC
PR24563

llvm-svn: 256187
2015-12-21 19:44:42 +00:00
Adrian Prantl 5d9acc2443 Teach ARMLoadStoreOptimizer to ignore DBG_VALUE instructions when merging
instructions.

As noted in PR24563.
rdar://problem/23963293

llvm-svn: 256183
2015-12-21 19:25:03 +00:00
Tom Stellard 2b65ed306d AMDGPU/SI: Fix encoding for FLAT_SCRATCH registers on VI
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15661

llvm-svn: 256178
2015-12-21 18:44:27 +00:00
Matthew Simpson 11c4de6054 [AArch64] Add additional extract-extend patterns for smov
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.

This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.

llvm-svn: 256176
2015-12-21 18:31:25 +00:00
Teresa Johnson 3e88a3200c Add testcase for r256161 (PR25907)
llvm-svn: 256174
2015-12-21 18:24:35 +00:00
Jun Bum Lim 4bb171c8da Revert "[AArch64] Promote loads from stores"
This reverts commit r256004 due to a failure in cortex-a53.

llvm-svn: 256160
2015-12-21 15:36:49 +00:00
Chad Rosier d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Igor Breger 44b60a3687 AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored.
llvm-svn: 256157
2015-12-21 14:40:36 +00:00
Amjad Aboud 60b5e1b6c0 Implemented Support of IA interrupt and exception handlers:
http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html

Differential Revision: http://reviews.llvm.org/D15567

llvm-svn: 256155
2015-12-21 14:07:14 +00:00
Zlatko Buljan 5da2f6cd03 [mips][microMIPS] Implement DERET and DI instructions and check size operand for EXT and DEXT* instructions
Differential Revision: http://reviews.llvm.org/D15570

llvm-svn: 256152
2015-12-21 13:08:58 +00:00
NAKAMURA Takumi 8c6c95ad81 check-llvm: Tweak the feature "timestamps" for autoconf.
Note, ENABLE_TIMESTAMPS is either 1 or 0 in Makefile.config.

llvm-svn: 256138
2015-12-21 08:46:12 +00:00
David Majnemer 18663f8787 [MC, COFF] Unbreak support for COFF timestamps
Support for COFF timestamps was unintentionally broken in r246905 when
it was conditionally available depending on whether or not LLVM was
configured with LLVM_ENABLE_TIMESTAMPS.  However, Config/config.h was
never included which essentially broke the feature.  Due to lax testing,
the breakage was never identified until we observed strange failures
during incremental links of Chromium.

This issue is resolved by simply including Config/config.h in
WinCOFFObjectWriter and teaching lit that the MC/COFF/timestamp.s test
is conditionally supported depending on LLVM_ENABLE_TIMESTAMPS.  With
this in place, we can strengthen the test to ensure that it will not
accidentally get broken in the future.

This fixes PR25891.

llvm-svn: 256137
2015-12-21 08:03:07 +00:00
Craig Topper 074e845260 [X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift.
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.

Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.

llvm-svn: 256126
2015-12-20 18:41:54 +00:00
Xinliang David Li 242ad0aded Fix a bug in test case -- duplicate entries
llvm-svn: 256117
2015-12-20 08:49:31 +00:00
Weiming Zhao 613c6862fa Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.

r250697: http://reviews.llvm.org/rL250697

Reviewers: rmaprath

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15653

llvm-svn: 256115
2015-12-20 06:41:44 +00:00
NAKAMURA Takumi 68369b1c2f Revert r219171, "llvm/test/lit.cfg: Suppress dwarf stuff for targeting x86_64-mingw32 while investigating since r219108."
It has been fixed since r219280 by David Majnemer.

llvm-svn: 256112
2015-12-20 03:48:23 +00:00
Sanjoy Das ab0626e35f Nonnull elements in OperandBundleCallSites are not all Instructions
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses).  This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.

llvm-svn: 256110
2015-12-19 22:40:28 +00:00
Sanjoy Das b496834f8e [Deopt bundles] Fix a test case
The `CHECK-NOT` line was incorrect, and would not have caught a
breakage.

llvm-svn: 256109
2015-12-19 22:40:22 +00:00
JF Bastien 374ea4bda5 WebAssembly: add vtable test
The test will mainly be useful to check that the .s file assembles and relocates properly because vtables reference functions in their data section.

llvm-svn: 256102
2015-12-19 18:55:18 +00:00
Manuel Jacob f4b3577d66 Remove double blanks. NFC.
llvm-svn: 256100
2015-12-19 18:26:53 +00:00
Keno Fischer f7346e0a6c Hopefully fix debug-info-blocks.ll test on win32 bot
llc_dwarf adds an mtriple, which forces this to use COFF, causing
the test to fail. Hopefully using regular llc without the triple
will work fine everywhere

llvm-svn: 256084
2015-12-19 03:32:23 +00:00
Tom Stellard ffc1a5aef7 AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders
Summary:
The analysis of shader inputs was completely wrong.  We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15608

llvm-svn: 256082
2015-12-19 02:54:15 +00:00
Philip Reames 5d54689bca [RS4GC] Remove an overly strong assertion
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation.  The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.

llvm-svn: 256079
2015-12-19 02:38:22 +00:00
Keno Fischer 00cbf9a69a Clean up the processing of dbg.value in various places
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.

Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.

Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.

Reviewers: aprantl

Subscribers: dsanders

Differential Revision: http://reviews.llvm.org/D14186

llvm-svn: 256077
2015-12-19 02:02:44 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Matt Arsenault 10a509292c Fix broken type legalization of min/max
This was using an anyext when promoting the type
when zext/sext is required.

llvm-svn: 256074
2015-12-19 01:39:48 +00:00
Nicolai Haehnle dd58705af6 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Rafael Espindola 2339ffed97 Deprecate a few C APIs.
This deprecates:
* LLVMParseBitcode
* LLVMParseBitcodeInContext
* LLVMGetBitcodeModuleInContext
* LLVMGetBitcodeModule

They are replaced with the functions with a 2 suffix which do not record
a diagnostic.

llvm-svn: 256065
2015-12-18 23:46:42 +00:00
Cong Hou fd0d62b87e Use getEdgeProbability() instead of getEdgeWeight() in BFI and remove getEdgeWeight() interfaces from MBPI.
This patch removes all getEdgeWeight() interfaces from CodeGen directory. As
getEdgeProbability() is a little more expensive than getEdgeWeight(), I will
compose a patch soon in which BPI only stores probabilities instead of edge
weights so that getEdgeProbability() will have O(1) time.


Differential revision: http://reviews.llvm.org/D15489

llvm-svn: 256039
2015-12-18 21:53:24 +00:00
Jingyue Wu 3f422280f5 [DivergenceAnalysis] fix a bug in computing influence regions
Fixes PR25864

llvm-svn: 256036
2015-12-18 21:44:26 +00:00
Jingyue Wu ba3ca76ed2 [NaryReassociate] allow candidate to have a different type
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.

Added a test in nary-gep.ll

Reviewers: tra, meheff

Subscribers: mcrosier, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D15618

llvm-svn: 256035
2015-12-18 21:36:30 +00:00
Rafael Espindola 708a91a103 Revert "Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst"
This reverts commit r256028.

It broke:

    LLVM :: CodeGen/Mips/eh.ll
    LLVM :: CodeGen/Mips/insn-zero-size-bb.ll

llvm-svn: 256032
2015-12-18 21:23:32 +00:00
Jun Bum Lim 51a247065e Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.

llvm-svn: 256028
2015-12-18 20:53:47 +00:00
Krzysztof Parzyszek 21dc8bdd9e [Hexagon] Add PIC support
llvm-svn: 256025
2015-12-18 20:19:30 +00:00
Pete Cooper 98052537f0 Revert "Improve DWARFDebugFrame::parse to also handle __eh_frame."
This reverts commit r256008.

Its breaking multiple buildbots, although works for me locally.

llvm-svn: 256013
2015-12-18 19:45:38 +00:00
Pete Cooper 6c97f4c7d7 Improve DWARFDebugFrame::parse to also handle __eh_frame.
LLVM MC has single methods which can handle the output of EH frame and DWARF CIE's and FDE's.

This code improves DWARFDebugFrame::parse to do the same for parsing.

This also allows llvm-objdump to support the --dwarf=frames option which objdump supports.  This
option dumps the .eh_frame section using the new code in DWARFDebugFrame::parse.

http://reviews.llvm.org/D15535

Reviewed by Rafael Espindola.

llvm-svn: 256008
2015-12-18 18:51:08 +00:00
Andrew Kaylor 123048d26a [WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop
Differential Revision: http://reviews.llvm.org/D15630

llvm-svn: 256005
2015-12-18 18:12:35 +00:00
Jun Bum Lim 3509d64c24 [AArch64] Promote loads from stores
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31

llvm-svn: 256004
2015-12-18 18:08:30 +00:00
Teresa Johnson 0e7c82cb69 [ThinLTO/LTO] Don't link in unneeded metadata
Summary:
Third patch split out from http://reviews.llvm.org/D14752.

Only map in needed DISubroutine metadata (imported or otherwise linked
in functions and other DISubroutine referenced by inlined instructions).
This is supported for ThinLTO, LTO and llvm-link --only-needed, with
associated tests for each one.

Depends on D14838.

Reviewers: dexonsmith, joker.eph

Subscribers: davidxl, llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D14843

llvm-svn: 256003
2015-12-18 17:51:37 +00:00
Rafael Espindola 7a36355b21 Handle archives with paths in the names.
We always create archives with just he filename as the member name, but
other archives can put a more complicated path in there.

This patches handles it by computing just the filename as we do when
adding a new member.

If storing the path is important for some reason, we should probably
have an orthogonal option for doing that and do it for both old and new
members.

Fixes pr25877.

llvm-svn: 256001
2015-12-18 16:07:17 +00:00
Rafael Espindola f382b8836a Fix error handling in LLVMGetBitcodeModuleInContext.
It was not setting OutMessage.

llvm-svn: 255998
2015-12-18 13:58:05 +00:00
Vaivaswatha Nagaraj ed237938da GlobalsAA: Take advantage of ArgMemOnly, InaccessibleMemOnly and InaccessibleMemOrArgMemOnly attributes
Summary:
1. Modify AnalyzeCallGraph() to retain function info for external functions
if the function has [InaccessibleMemOr]ArgMemOnly flags.
2. When analyzing the use of a global is function parameter at a call site,
mark the callee also as modifying the global appropriately.
3. Add additional test cases.

Depends on D15499

Reviewers: hfinkel, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15605

llvm-svn: 255994
2015-12-18 11:02:52 +00:00
Zlatko Buljan 252cca555f [mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV and WRDSP instructions
Differential Revision: http://reviews.llvm.org/D14429

llvm-svn: 255991
2015-12-18 08:59:37 +00:00
Rafael Espindola 125592dc8d Add a test for LLVMGetBitcodeModule.
llvm-svn: 255985
2015-12-18 03:57:26 +00:00
Hans Wennborg a6a2e512cf [X86] Use push-pop for materializing small constants under 'minsize'
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.

This is a follow-up to r255656.

Differential Revision: http://reviews.llvm.org/D15549

llvm-svn: 255936
2015-12-17 23:18:39 +00:00
Philip Reames d7a6cc859a [InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.

Key points:
 * We only remove unordered or simple stores.
 * The loads producing values consumed by dead stores don't influence whether the store is dead.

Differential Revision: http://reviews.llvm.org/D15354

llvm-svn: 255932
2015-12-17 22:19:27 +00:00
JF Bastien d1fb58538f Polish atomic pointers
Summary:
I didn't realize that we already allowed atomic load/store of pointers,
it was added in 2012 by r162146. This patch updates the documentation
and tightens the verifier by using DataLayout to make sure that the
stored size is byte-sized and power-of-two. DataLayout is also used for
integers, and while I'm here I updated the corresponding code for
cmpxchg and rmw.

See the following discussion for context and upcoming changes to
add floating-point and vector atomics:
  https://groups.google.com/forum/#!topic/llvm-dev/Nh0P_E3CRoo/discussion

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15512

llvm-svn: 255931
2015-12-17 22:09:19 +00:00
Rafael Espindola 762a6ac0b5 Pass -m elf_x84_64 to gold invocations.
Fixes pr25868.

llvm-svn: 255930
2015-12-17 21:56:27 +00:00
Matthew Simpson 13dddb0799 Revert "[AArch64] Add DAG combine for extract extend pattern"
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.

llvm-svn: 255928
2015-12-17 21:29:47 +00:00
Dan Gohman 670a60ed52 [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.
llvm-svn: 255925
2015-12-17 20:50:45 +00:00
Adrian Prantl b1be76389d Hardcode the target in this testcase — it depends on the ABI.
This fixes a failure on Windows buildbots.

llvm-svn: 255919
2015-12-17 19:33:56 +00:00
Philip Reames 15145fb7b1 [EarlyCSE] DSE of atomic unordered stores
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.

For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.

Differential Revision: http://reviews.llvm.org/D15352

llvm-svn: 255914
2015-12-17 18:50:50 +00:00
Adrian Prantl 42c1e29236 make this test less whitespace-sensitive.
llvm-svn: 255913
2015-12-17 18:34:37 +00:00
Adrian Prantl 4293ee9d3d Rewrite test to use llvm-dwarfdump instead of checking for asm comments.
llvm-svn: 255912
2015-12-17 18:25:51 +00:00
Teresa Johnson e5a6191732 [ThinLTO] Metadata linking for imported functions
Summary:
Second patch split out from http://reviews.llvm.org/D14752.

Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.

This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.

Depends on D14825.

Reviewers: dexonsmith, joker.eph

Subscribers: davidxl, llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D14838

llvm-svn: 255909
2015-12-17 17:14:09 +00:00
Tom Stellard caaa3aa07c AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15583

Patch by: Changpeng Fang

llvm-svn: 255908
2015-12-17 17:05:09 +00:00
Rafael Espindola f44db24e1f Avoid explicit relocation sorting most of the time.
These days relocations are created and stored in a deterministic way.
The order they are created is also suitable for the .o file, so we don't
need an explicit sort.

The last remaining exception is MIPS.

llvm-svn: 255902
2015-12-17 16:22:06 +00:00
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Alexey Bataev 7b72b658cc [X86] Add option for enabling LEA optimization pass, by Andrey Turetsky
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573

llvm-svn: 255881
2015-12-17 07:34:39 +00:00
Cong Hou b9e8d483b5 Fix PR25838.
This is a quick fix to PR25838. The issue comes from the restriction that we
cannot normalize probabilities containing both known and unknown ones. A patch
that removes this restriction is under the review now:

http://reviews.llvm.org/D15548

llvm-svn: 255867
2015-12-17 01:29:08 +00:00
Dan Gohman 4172953813 [WebAssembly] Fix legalization of shift operators on large integer types.
llvm-svn: 255847
2015-12-16 23:25:51 +00:00
Derek Schuff 8bb5f2927a [WebAssembly] Implement eliminateCallFramePseudo
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).

Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.

Differential Revision: http://reviews.llvm.org/D15587

llvm-svn: 255844
2015-12-16 23:21:30 +00:00
Rafael Espindola 434e956181 Change linkInModule to take a std::unique_ptr.
Passing in a std::unique_ptr should help find errors when the module
is used after being linked into another module.

llvm-svn: 255842
2015-12-16 23:16:33 +00:00
Eric Christopher bfba572425 Fix funciton->function typo.
llvm-svn: 255841
2015-12-16 23:10:53 +00:00
NAKAMURA Takumi 6cd453b09d Move llvm/test/DebugInfo/live-debug-values.ll into X86, due to target triple.
llvm-svn: 255834
2015-12-16 22:44:10 +00:00
Nathan Slingerland 48dd080c77 [PGO] Handle and report overflow during profile merge for all types of data
Summary: Surface counter overflow when merging profile data. Merging still occurs on overflow but counts saturate to the maximum representable value. Overflow is reported to the user.

Reviewers: davidxl, dnovillo, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15547

llvm-svn: 255825
2015-12-16 21:45:43 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Derek Schuff 45cd5a79b2 [WebAssembly] Print an extra local decl when the user stack pointer is used
Differential Revision: http://reviews.llvm.org/D15546

llvm-svn: 255815
2015-12-16 20:43:06 +00:00
Reid Kleckner 187d33ee74 Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions"
This reverts commit r255762.

llvm-svn: 255806
2015-12-16 19:21:03 +00:00
Dan Gohman b3aa1ecab0 [WebAssembly] Fix the CFG Stackifier to handle unoptimized branches
If a branch both branches to and falls through to the same block, treat it as
an explicit branch.

llvm-svn: 255803
2015-12-16 19:06:41 +00:00
Matt Arsenault e05ff15186 AMDGPU: Override getCFInstrCost
The default cost was 0 with the assumption that it is predictable.

llvm-svn: 255796
2015-12-16 18:37:19 +00:00
Reid Kleckner 83ebad370c Reland "[llvm-readobj] Simplify usage of -codeview flag"
Relands r255790 with fixed tests.

llvm-svn: 255793
2015-12-16 18:28:12 +00:00
Dan Gohman e2831b4e27 [WebAssembly] Use the new offset syntax for memory operands in inline asm.
llvm-svn: 255788
2015-12-16 18:14:49 +00:00
Ulrich Weigand 88a7a2eac7 [SystemZ] Sort relocs to avoid code corruption by linker optimization
The SystemZ linkers provide an optimization to transform a general-
or local-dynamic TLS sequence into an initial-exec sequence if possible.
Do do that, the compiler generates a function call to __tls_get_offset,
which is a brasl instruction annotated with *two* relocations:

- a R_390_PLT32DBL to install __tls_get_offset as branch target
- a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that
  the TLS optimization should be performed if possible

If the optimization is performed, the brasl is replaced by an ld load
instruction.

However, *both* relocs are processed independently by the linker.
Therefore it is crucial that the R_390_PLT32DBL is processed *first*
(installing the branch target for the brasl) and the R_390_TLS_GDCALL
is processed *second* (replacing the whole brasl with an ld).

If the relocs are swapped, the linker will first replace the brasl
with an ld, and *then* install the __tls_get_offset branch target
offset.  Since ld has a different layout than brasl, this may even
result in a completely different (or invalid) instruction; in any
case, the resulting code is corrupted.

Unfortunately, the way the MC common code sorts relocations causes
these two to *always* end up the wrong way around, resulting in
wrong code generation by the linker and crashes.

This patch overrides the sortRelocs routine to detect this particular
pair of relocs and enforce the required order.

llvm-svn: 255787
2015-12-16 18:12:40 +00:00
Ulrich Weigand 47f3649374 [SystemZ] Fix assertion failure in adjustSubwordCmp
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used.  This is
why the code in adjustSubwordCmp had this assertion:

    assert(C.ICmpType == SystemZICMP::Any &&
           "Signedness shouldn't matter here.");

assuming the the caller had already detected that fact.  However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.

Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.

llvm-svn: 255786
2015-12-16 18:04:06 +00:00
Tobias Edler von Koch b51460cf86 [Hexagon] Make memcpy lowering thread-safe
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.

llvm-svn: 255785
2015-12-16 17:29:37 +00:00
Charlie Turner b69b92855d [NFC] Update horizontal reduction test cases.
These testcases no longer need to specify -slp-vectorize-hor, since it was
enabled by default in r252733.

llvm-svn: 255783
2015-12-16 17:22:24 +00:00
Dan Gohman 30a42bf585 [WebAssembly] Support more kinds of inline asm operands
llvm-svn: 255782
2015-12-16 17:15:17 +00:00
Vaivaswatha Nagaraj fb3f4907c0 Add InaccessibleMemOnly and inaccessibleMemOrArgMemOnly attributes
Summary:
This patch introduces two new function attributes 

InaccessibleMemOnly: This attribute indicates that the function may only access memory that is not accessible by the program/IR being compiled. This is a weaker form of ReadNone.
inaccessibleMemOrArgMemOnly: This attribute indicates that the function may only access memory that is either not accessible by the program/IR being compiled, or is pointed to by its pointer arguments. This is a weaker form of  ArgMemOnly

Test cases have been updated. This revision uses this (d001932f3a) as reference.

Reviewers: jmolloy, hfinkel

Subscribers: reames, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D15499

llvm-svn: 255778
2015-12-16 16:16:19 +00:00
James Molloy 3d21dcf3ed [SimplifyCFG] Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

Now with a fix (and fixed tests) for the conformance issue seen in Chromium.

llvm-svn: 255767
2015-12-16 14:12:44 +00:00
Oliver Stannard 2de8c16913 [ARM] Add ARMv8.2-A FP16 vector instructions
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.

Differential Revision: http://reviews.llvm.org/D15039

llvm-svn: 255764
2015-12-16 12:37:39 +00:00
Oliver Stannard 48568cbe18 [ARM] Add ARMv8.2-A FP16 scalar instructions
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.

These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.

Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.

New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.

Differential Revision: http://reviews.llvm.org/D15038

llvm-svn: 255762
2015-12-16 11:35:44 +00:00
Michael Kuperstein e75e6e2a23 [X86] Improve shift combining
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into       (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into    (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])

sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.

This fixes PR24373.

Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161

llvm-svn: 255761
2015-12-16 11:22:37 +00:00
Vikram TV 859ad29b52 Recommit LiveDebugValues pass after fixing a couple of minor issues.
llvm-svn: 255759
2015-12-16 11:09:48 +00:00
Chen Li e2222fdf95 Remove FileCheck from test case token_landingpad.ll.
The test case only needs to make sure it does not crash LLVM.

llvm-svn: 255755
2015-12-16 06:27:09 +00:00
Chen Li 3e0da9de77 Fixed test case in rL255749: [SelectionDAGBuilder] Adds support for landingpads of token type.
llvm-svn: 255750
2015-12-16 05:05:18 +00:00
Chen Li 3e8330a1fe [SelectionDAGBuilder] Adds support for landingpads of token type
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.

Reviewers: JosephTremoulet, majnemer, rnk

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15405

llvm-svn: 255749
2015-12-16 04:48:42 +00:00
Philip Reames ae1f265bf1 [EarlyCSE] DSE of stores which write back loaded values
Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.

I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)

Differential Revision: http://reviews.llvm.org/D15397

llvm-svn: 255739
2015-12-16 01:01:30 +00:00
Philip Reames 61a24ab6cc [IR] Add support for floating pointer atomic loads and stores
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Differential Revision: http://reviews.llvm.org/D15471

llvm-svn: 255737
2015-12-16 00:49:36 +00:00
Mike Aizatsky 2ad650ea4f [sancov] blacklist support.
Summary:
Using the blacklist the user can filter own unwanted functions
from all outputs. By default blacklist contains "fun:__sancov*" line.

Differential Revision: http://reviews.llvm.org/D15364

llvm-svn: 255732
2015-12-16 00:31:48 +00:00
Peter Collingbourne 87debbbb89 Un-XFAIL JIT EH tests under [am]san.
These tests started passing after libcxxabi's r255559, which fixed a problem
relating to how libcxxabi links its EH library. The test failures were
caused by an issue with libc++, not the sanitizers (confirmed by building a
pre-r255559 revision with libc++/libc++abi and without sanitizers), so they
should never have been XFAILed under the sanitizers.

llvm-svn: 255708
2015-12-15 23:46:21 +00:00
Reid Kleckner 7850c9f5ca [WinEH] Make llvm.x86.seh.recoverfp work on x64
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.

Fixes SEH filter captures, which were broken in r250088.

Issue reported by Alex Crichton.

llvm-svn: 255707
2015-12-15 23:40:58 +00:00
Tom Stellard 7750f4ed9e AMDGPU/SI: Set the code object work group segment size when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15493

llvm-svn: 255702
2015-12-15 23:15:25 +00:00
Sanjay Patel 271efcdf20 [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend 
already has in PerformSELECTCombine().

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the 
vector.

Differential Revision: http://reviews.llvm.org/D15294

llvm-svn: 255700
2015-12-15 23:11:43 +00:00
Evgeniy Stepanov 67849d56c3 Cross-DSO control flow integrity (LLVM part).
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.

llvm-svn: 255693
2015-12-15 23:00:08 +00:00
Tom Stellard a495307e5e AMDGPU/SI: Set the code objects private segment size when targeting HSA.
Summary: I'm not sure how things worked before without this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15492

llvm-svn: 255692
2015-12-15 22:55:30 +00:00
Cong Hou a73ffa2206 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255691
2015-12-15 22:45:09 +00:00
Tom Stellard 29dd05e92f AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15426

llvm-svn: 255689
2015-12-15 22:39:36 +00:00
Dan Gohman 4b9d7916ee [WebAssembly] Implement instruction selection for constant offsets in addresses.
Add instruction patterns for matching load and store instructions with constant
offsets in addresses. The code is fairly redundant due to the need to replicate
everything between imm, tglobaldadr, and texternalsym, but this appears to be
common tablegen practice. The main alternative appears to be to introduce
matching functions with C++ code, but sticking with purely generated matchers
seems better for now.

Also note that this doesn't yet support offsets from getelementptr, which will
be the most common case; that will depend on a change in target-independent code
in order to set the NoUnsignedWrap flag, which I'll submit separately. Until
then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add.

Also implement isLegalAddressingMode with an approximation of this.

Differential Revision: http://reviews.llvm.org/D15538

llvm-svn: 255681
2015-12-15 22:01:29 +00:00
David Majnemer 3bb88c0210 [WinEH] Use operand bundles to describe call sites
SimplifyCFG allows tail merging with code which terminates in
unreachable which, in turn, makes it possible for an invoke to end up in
a funclet which it was not originally part of.

Using operand bundles on invokes allows us to determine whether or not
an invoke was part of a funclet in the source program.

Furthermore, it allows us to unambiguously answer questions about the
legality of inlining into call sites which the personality may have
trouble with.

Differential Revision: http://reviews.llvm.org/D15517

llvm-svn: 255674
2015-12-15 21:27:27 +00:00
Xinliang David Li 2a27fc40a8 Test cleanup -- remove duplicate run lines
llvm-svn: 255673
2015-12-15 21:15:06 +00:00
Tom Stellard a6f24c6565 AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672
2015-12-15 20:55:55 +00:00
James Y Knight 33beb24318 [Sparc] Fix handling of double incoming arguments on sparc little-endian.
On SparcV8, doubles get passed in two 32-bit integer registers. The call
code was already handling endianness correctly, but the incoming
argument code was not -- it got the two halves in opposite order.

Also remove some dead code in LowerFormalArguments_32 to handle
less-than-32bit values, which can't actually happen.

Finally, add some test cases for the 32-bit calling convention, cribbed
from the 64abi.ll test, and run for both big and little-endian.

llvm-svn: 255668
2015-12-15 19:23:12 +00:00
Michael Kuperstein 53946bf8c6 [X86] MOVPC32r should only emit CFI adjustments when needed
We only want to emit CFI adjustments when actually using DWARF.
This fixes PR25828.

Differential Revision: http://reviews.llvm.org/D15522

llvm-svn: 255664
2015-12-15 18:50:32 +00:00
Sanjay Patel 38a022623a [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.

The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826

but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818

Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)

As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. 
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.

A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.

Differential Revision: http://reviews.llvm.org/D15213

llvm-svn: 255660
2015-12-15 17:38:29 +00:00
Nathan Slingerland 7f5b47ddd4 [llvm-profdata] Add support for weighted merge of profile data (2nd try)
Summary:
This change adds support for specifying a weight when merging profile data with the llvm-profdata tool.
Weights are specified by using the --weighted-input=<weight>,<filename> option. Input files not specified
with this option (normal positional list after options) are given a default weight of 1.

Adding support for arbitrary weighting of input profile data allows for relative importance to be placed on the
input data from multiple training runs.

Both sampled and instrumented profiles are supported.

Reviewers: davidxl, dnovillo, bogner, silvas

Subscribers: silvas, davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D15306

llvm-svn: 255659
2015-12-15 17:37:09 +00:00
Nicolai Hahnle 78fd4f087b AMDGPU: mark ldexp LibCalls as unavailable
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.

In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.

See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewers: arsenm, tstellarAMD

Differential Revision: http://reviews.llvm.org/D14990

llvm-svn: 255658
2015-12-15 17:24:15 +00:00
Hans Wennborg 08d5905bac [X86] Smaller code for materializing 32-bit 1 and -1 constants
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.

Differential Revision: http://reviews.llvm.org/D14971

llvm-svn: 255656
2015-12-15 17:10:28 +00:00
Krzysztof Parzyszek 372bd80834 [Hexagon] Preprocess mapped instructions before lowering to MC
llvm-svn: 255653
2015-12-15 17:05:45 +00:00
Tom Stellard 43f52df0b5 AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15475

llvm-svn: 255652
2015-12-15 17:02:52 +00:00
Tom Stellard ad7d03daa6 AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15474

llvm-svn: 255651
2015-12-15 17:02:49 +00:00
Nemanja Ivanovic 8922476bcb Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649
2015-12-15 14:50:34 +00:00
Michael Kuperstein 801ee74167 Do not try to use i8 and i16 versions of FP_TO_U/SINT soft float library calls
It appears that neither compiler-rt nor the gnu soft-float libraries actually
implement these conversions. Instead of emitting calls to library functions
that don't exist, handle it similarly to the way we handle i8 -> float and
i16 -> float conversions: call the i32 library function, and adjust the type.

Differential Revision: http://reviews.llvm.org/D15151

llvm-svn: 255643
2015-12-15 12:55:50 +00:00
Cong Hou 3ba9cf6020 Improve the successor list update in TailDuplication.cpp.
This patch improves a temporary fix in r255530 so that we can normalize
successor list without trigger assertion failures in tail duplication pass.

llvm-svn: 255638
2015-12-15 10:10:40 +00:00
Elena Demikhovsky 6015f5c823 Type legalizer for masked gather and scatter intrinsics.
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).

This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.

Differential Revision: http://reviews.llvm.org/D13633

llvm-svn: 255629
2015-12-15 08:40:41 +00:00
Quentin Colombet b82786e0ff [ShrinkWrapping] Do not choose restore point inside loops.
The post-dominance property is not sufficient to guarantee that a restore point
inside a loop is safe.
E.g.,
 while(1) {
   Save
   Restore
   if (...)
     break;
   use/def CSRs
 }
All the uses/defs of CSRs are dominated by Save and post-dominated
by Restore. However, the CSRs uses are still reachable after
Restore and before Save are executed.

This fixes PR25824

llvm-svn: 255613
2015-12-15 03:28:11 +00:00
Dan Gohman dcba338188 [WebAssembly] Remove .import printing.
For now, LLVM doesn't know about wasm module imports, so it shouldn't
emit .import directives.

llvm-svn: 255602
2015-12-15 02:20:44 +00:00
JF Bastien 65f0a71f40 WebAssembly: test global array indexing
This case was tested in the linker from code, but not from globals indexing into other globals. The linker currently barfs on this, ncbray volunteered to fix it.

llvm-svn: 255601
2015-12-15 02:02:51 +00:00
Mehdi Amini 1c131b37ed Instcombine: destructor loads of structs that do not contains padding
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).

Also update extractvalue.ll and cast.ll so that they use structs with padding.

Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.

Patch by: Amaury SECHET <deadalnix@gmail.com>

Differential Revision: http://reviews.llvm.org/D14483

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
2015-12-15 01:44:07 +00:00
Reid Kleckner 7c0c0c0501 [llvm-readobj] s/FunctionName/LinkageName/ for codeview dumping
The symbol being printed in this field comes from the main symbol table,
not 0xF1 subsection. Use LinkageName to make that a lot clearer.

llvm-svn: 255596
2015-12-15 01:23:55 +00:00
Xinliang David Li c7018a25c6 [PGO] make profile prefix even shorter and more readable
llvm-svn: 255586
2015-12-15 00:32:56 +00:00
Quentin Colombet 25b43f3624 [X86] Add relaxtion logic for SBB instructions.
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.

rdar://problem/23785506

llvm-svn: 255583
2015-12-15 00:09:23 +00:00
Xinliang David Li 0812747979 [PGO] Shorten profile symbol prefixes
Profile symbols have long prefixes which waste space and creating pressure for linker.
This patch shortens the prefixes to minimal length without losing verbosity.

Differential Revision: http://reviews.llvm.org/D15503

llvm-svn: 255575
2015-12-14 23:26:27 +00:00
Rafael Espindola 9d2bfc4874 Use diagnostic handler in the LLVMContext
This patch converts code that has access to a LLVMContext to not take a
diagnostic handler.

This has a few advantages

* It is easier to use a consistent diagnostic handler in a single program.
* Less clutter since we are not passing a handler around.

It does make it a bit awkward to implement some C APIs that return a
diagnostic string. I will propose new versions of these APIs and
deprecate the current ones.

llvm-svn: 255571
2015-12-14 23:17:03 +00:00
Quentin Colombet 2cb8a51c1f [X86] Add relaxtion logic for ADC instructions.
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.

rdar://problem/23785506

llvm-svn: 255570
2015-12-14 23:12:40 +00:00
Dan Gohman c7c0445443 [WebAssembly] Add type prefixes to call instructions
Add return type information to call and call_indirect instructions. This
allows them to be disambiguated without knowledge of the callee.

Differential Revision: http://reviews.llvm.org/D15484

llvm-svn: 255565
2015-12-14 22:56:51 +00:00
Dan Gohman 8fe7e86bf5 [WebAssembly] Implement a new algorithm for placing BLOCK markers
Implement a new BLOCK scope placement algorithm which better handles
early-return blocks and early exists from nested scopes.

Differential Revision: http://reviews.llvm.org/D15368

llvm-svn: 255564
2015-12-14 22:51:54 +00:00
Reid Kleckner db9a91e324 Revert "Don't create unnecessary PHIs"
This reverts commit r255489.

It causes test failures in Chromium and does not appear to respect the
AlternativeV parameter.

llvm-svn: 255562
2015-12-14 22:36:57 +00:00
Chih-Hung Hsieh 7993e18e80 [X86] Part 2 to fix x86-64 fp128 calling convention.
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
  Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
  When target has MMX registers, configure MVT::f128 in FR128RegClass,
  with TypeSoftenFloat action, and custom actions for some opcodes.
  Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
  Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
  Fix infinite loop when f128 type can have
  VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.

Differential Revision: http://reviews.llvm.org/D11438

llvm-svn: 255558
2015-12-14 22:08:36 +00:00
Sanjay Patel fa54acedd1 add fast-math-flags to 'call' instructions (PR21290)
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add 
support to clang, and extend FMF to the DAG for calls.

Motivating example:

%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)

We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:

%z = tail call fast float @sqrtf(float %y)

because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.

The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368

and added FMF to fcmp:
http://reviews.llvm.org/rL241901

Differential Revision: http://reviews.llvm.org/D14707

llvm-svn: 255555
2015-12-14 21:59:03 +00:00
Pete Cooper 52abc5f971 Start implementing FDE dumping when printing the eh_frame.
This code adds some simple decoding of the FDE's in an eh_frame.

There's still more to be done in terms of error handling and verification.

Also, we need to be able to decode the CFI's.

llvm-svn: 255550
2015-12-14 21:49:49 +00:00
Pete Cooper 23bfa7e925 Print the eh_frame section in MachoDump.
This is the start of work to dump the contents of the eh_frame section.

It currently emits CIE entries.  FDE entries will come later.

It also needs improved error checking which will follow soon.

http://reviews.llvm.org/D15502

Reviewed by Kevin Enderby and Lang Hames.

llvm-svn: 255546
2015-12-14 21:39:27 +00:00
David Majnemer 59be1d653a [ConstantFold] Fix bitcast to gep constant folding transform.
Make sure to check that the destination type is sized.
A check was present but was incorrectly checking the source type
instead.

Patch by Amaury SECHET!

Differential Revision: http://reviews.llvm.org/D15264

llvm-svn: 255536
2015-12-14 19:30:32 +00:00
Sanjoy Das adfec011e1 [MergeFunctions] Use II instead of CI for InvokeInst; NFC
Using `CI` is slightly misleading.

llvm-svn: 255529
2015-12-14 19:11:45 +00:00
Sanjoy Das 2a74eb0000 Teach MergeFunctions about operand bundles
llvm-svn: 255528
2015-12-14 19:11:40 +00:00
Sanjoy Das 2de4d0aa18 Teach haveSameSpecialState about operand bundles
llvm-svn: 255527
2015-12-14 19:11:35 +00:00
Xinliang David Li e3bf4fd394 [PGO] Value profiling text format reader/writer support
This patch adds the missing functionality in parsable
text format support for value profiling.

Differential Revision: http://reviews.llvm.org/D15212

llvm-svn: 255523
2015-12-14 18:44:01 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
Paul Robinson accc3e0376 FastISel needs to remove dead code when it bails out.
When FastISel fails to translate an instruction it hands off code
generation to SelectionDAG. Before it does so, it may have generated
local value instructions to feed phi nodes in successor blocks. These
instructions will then be generated again by SelectionDAG, causing
duplication and less efficient code, including extra spill
instructions.

Patch by Wolfgang Pieb!

Differential Revision: http://reviews.llvm.org/D11768

llvm-svn: 255520
2015-12-14 18:33:18 +00:00
Petar Jovanovic 280f7101e8 [Power PC] llvm soft float support for ppc32
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D13700

llvm-svn: 255516
2015-12-14 17:57:33 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Matt Arsenault 52a52a564b AMDGPU: Fix splitting vector loads with existing offsets
If the original MMO had an offset, it was dropped.
Also use the correct alignment after adding the new offset.

llvm-svn: 255508
2015-12-14 16:59:40 +00:00
Sanjay Patel f727e387be [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
This is a fix for PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The idea is to take the existing fold of:
bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X)
( http://reviews.llvm.org/rL112232 )

And break it into less specific transforms so we'll catch more cases such as
the example in the bug report:
bitcast ( trunc ( lshr ( bitcast X))) -->
bitcast ( extractelement (bitcast X)) -->
extractelement (bitcast X)

Enabling patches for this change:
http://reviews.llvm.org/rL255399 (combine bitcasts)
http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X))

Differential Revision: http://reviews.llvm.org/D15392

llvm-svn: 255504
2015-12-14 16:16:54 +00:00
Adhemerval Zanella d2b10c5e9a [sanitizer] [msan] VarArgHelper for AArch64
This patch add support for variadic argument for AArch64.  All the MSAN
unit tests are not passing as well the signal_stress_test (currently
set as XFAIl for aarch64).

llvm-svn: 255495
2015-12-14 14:14:15 +00:00
James Molloy 2b1e101e99 Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

llvm-svn: 255489
2015-12-14 10:57:01 +00:00
David Blaikie f5cb6279a6 [llvm-dwp] Deduplicate type units
It's O(N^2) because it does a simple walk through the existing types to
find duplicates, but that will be fixed in a follow-up commit to use a
mapping data structure of some kind.

llvm-svn: 255482
2015-12-14 07:42:00 +00:00
David Blaikie 429f8ca66d [llvm-dwp] Remove some unused test code
llvm-svn: 255481
2015-12-14 07:41:56 +00:00
Michael Zuckerman c5f47b3571 I Added a triple flag for x86-evenDirective test.
Continue of rL255461

Differential Revision: http://reviews.llvm.org/D15413

llvm-svn: 255469
2015-12-13 21:12:33 +00:00
Cong Hou ccec6e4d84 Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.

llvm-svn: 255463
2015-12-13 17:15:38 +00:00
Michael Zuckerman 02ecd43c63 [X86][inline asm] support even directive
The .even directive aligns content to an evan-numbered address.

In at&t syntax .even 
In Microsoft syntax even (without the dot).

Differential Revision: http://reviews.llvm.org/D15413

llvm-svn: 255462
2015-12-13 17:07:23 +00:00
Cong Hou e6a210f50b [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255460
2015-12-13 16:55:46 +00:00
Cong Hou 7c369156eb Revert r255454 as it leads to several test failers on buildbots.
llvm-svn: 255456
2015-12-13 09:28:57 +00:00
Cong Hou 7f8b43d424 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255454
2015-12-13 08:44:08 +00:00
Saleem Abdulrasool 778c268594 ARM: only emit EABI attributes on EABI targets
EABI attributes should only be emitted on EABI targets.  This prevents the
emission of the optimization goals EABI attribute on Windows ARM.

llvm-svn: 255448
2015-12-13 05:27:45 +00:00
Simon Pilgrim 052191dd82 [X86][AVX512] Added support for VMOVQ shuffle comments
llvm-svn: 255442
2015-12-12 21:46:23 +00:00
Manuel Jacob 1578ec8860 Partially fix memcpy / memset / memmove lowering in SelectionDAG construction if address space != 0.
Summary:
Previously SelectionDAGBuilder asserted that the pointer operands of
memcpy / memset / memmove intrinsics are in address space < 256.  This assert
implicitly assumed the X86 backend, where all address spaces < 256 are
equivalent to address space 0 from the code generator's point of view.  On some
targets (R600 and NVPTX) several address spaces < 256 have a target-defined
meaning, so this assert made little sense for these targets.

This patch removes this wrong assertion and adds extra checks before lowering
these intrinsics to library calls.  If a pointer operand can't be casted to
address space 0 without changing semantics, a fatal error is reported to the
user.

The new behavior should be valid for all targets that give address spaces != 0
a target-specified meaning (NVPTX, R600, X86).  NVPTX lowers big or
variable-sized memory intrinsics before SelectionDAG construction.  All other
memory intrinsics are inlined (the threshold is set very high for this target).
R600 doesn't support memcpy / memset / memmove library calls (previously the
illegal emission of a call to such library function triggered an error
somewhere in the code generator).  X86 now emits inline loads and stores for
address spaces 256 and 257 up to the same threshold that is used for address
space 0 and reports a fatal error otherwise.

I call this a "partial fix" because there are still cases that can't be
lowered.  A fatal error is reported in these cases.

Reviewers: arsenm, theraven, compnerd, hfinkel

Subscribers: hfinkel, llvm-commits, alex

Differential Revision: http://reviews.llvm.org/D7241

llvm-svn: 255441
2015-12-12 21:33:31 +00:00
Xinliang David Li d1bab96045 [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This is the second try. The fix in this patch is very localized.
Only profile symbol names of profile symbols with internal 
linkage are fixed up while initializer of name syms are not 
changes. This means there is no format change nor version bump.

llvm-svn: 255434
2015-12-12 17:28:03 +00:00
Sanjay Patel 1d49fc9b27 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
Simon Pilgrim a2d1591876 [X86][AVX] Tests tidyup
Cleanup/regenerate some tests for some upcoming patches.

llvm-svn: 255432
2015-12-12 12:52:52 +00:00
David Majnemer 550654aaf1 Move catchpad-phi-cast.ll to the X86 specific subdirectory
It is X86 specific and will not be properly exercised unless LLVM is
built with the X86 target.

llvm-svn: 255426
2015-12-12 06:21:08 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Chen Li 1b26b9ec9d [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255415
2015-12-12 01:04:15 +00:00
Hal Finkel 4d3da9c29b Fix test/CodeGen/PowerPC/ppc-shrink-wrapping.ll after r255398
llvm-svn: 255414
2015-12-12 00:42:05 +00:00
Sanjay Patel 93f55dd36d [InstCombine] allow any pair of bitcasts to be combined
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.

It should be safe to convert any pair of bitcasts into a single bitcast, 
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm 
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases. 

Some day we'll get to remove that MMX wart from LLVM IR completely?

Differential Revision: http://reviews.llvm.org/D15468

llvm-svn: 255399
2015-12-12 00:33:36 +00:00
Hal Finkel 65539e3c94 [PowerPC] Add Branch Hints for Highly-Biased Branches
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.

Patch by Tom Jablin!

llvm-svn: 255398
2015-12-12 00:32:00 +00:00
Chen Li 02ef2e1385 Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
because it broke buildbot.

llvm-svn: 255395
2015-12-12 00:08:37 +00:00
Sanjay Patel ffde9e14a2 use FileCheck for better checking
llvm-svn: 255394
2015-12-12 00:01:10 +00:00
Derek Schuff 9769debf88 [WebAssembly] Implement prolog/epilog insertion and FrameIndex elimination
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.

Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).

Differential Revision: http://reviews.llvm.org/D15344

llvm-svn: 255392
2015-12-11 23:49:46 +00:00
Chen Li e8f9387e0c [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255391
2015-12-11 23:39:32 +00:00
Matt Arsenault fabab4b7dd SelectionDAG: Match min/max if the scalar operation is legal
llvm-svn: 255388
2015-12-11 23:16:47 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Sanjay Patel d497ad43da Add tests for bitcast-bitcast sequences for all scalar/vector permutations
As noted in http://reviews.llvm.org/D15392 , we should be able to improve this.

llvm-svn: 255370
2015-12-11 20:26:30 +00:00
Xinliang David Li a86545b0b5 [PGO] Revert r255365: solution incomplete, not handling lambda yet
llvm-svn: 255369
2015-12-11 20:23:22 +00:00
Xinliang David Li c79283ef29 [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is 
preserved with this change.

Differential Revision: http://reviews.llvm.org/D15243

llvm-svn: 255365
2015-12-11 19:53:19 +00:00
Matthias Braun 60d69e2865 CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.

This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).

Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.

This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033

Differential Revision: http://reviews.llvm.org/D15320

llvm-svn: 255362
2015-12-11 19:42:09 +00:00
Chad Rosier d7634fc91d Revert r255247, r255265, and r255286 due to serious compile-time regressions.
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."

llvm-svn: 255354
2015-12-11 18:39:41 +00:00
Frederic Riss 841b1732df [dsymutil] Ignore absolute symbols in the debug map
Quoting from the comment added to the code:

    // Objective-C on i386 uses artificial absolute symbols to
    // perform some link time checks. Those symbols have a fixed 0
    // address that might conflict with real symbols in the object
    // file. As I cannot see a way for absolute symbols to find
    // their way into the debug information, let's just ignore those.

llvm-svn: 255350
2015-12-11 17:50:37 +00:00
James Molloy 1bb6ea5e2d [Mem2Reg] Respect optnone
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.

llvm-svn: 255336
2015-12-11 13:36:59 +00:00
James Molloy 37b82e79b2 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
Kyle Butt 1452b76f1f [PPC]: Peephole optimize small accesss to aligned globals.
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
        addis 3, 2, b4v@toc@ha
        addi 4, 3, b4v@toc@l
        lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
        lbz 6, 1(4)         ; optimizer
        lbz 7, 2(4)
        lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
        addis 3, 2, b4v@toc@ha
        lbz 4, b4v@toc@l(3)
        lbz 5, b4v@toc@l+1(3)
        lbz 6, b4v@toc@l+2(3)
        lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.

llvm-svn: 255319
2015-12-11 00:47:36 +00:00
Cong Hou 59898d8c68 [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.


Differential revision: http://reviews.llvm.org/D15132

llvm-svn: 255315
2015-12-11 00:31:39 +00:00
Eric Christopher 325e8d06dc Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,
x)) combines for ppc_fp128, since signbit computation is more
complicated.

Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html

Patch by Tim Shen!

llvm-svn: 255305
2015-12-10 22:09:06 +00:00
Kyle Butt 28b01a51b3 PPC: Teach FMA mutate to respect register classes.
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.

llvm-svn: 255299
2015-12-10 21:28:40 +00:00
JF Bastien 82bf85ffed EarlyCSE: add tests
Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15371

llvm-svn: 255295
2015-12-10 20:24:34 +00:00
Simon Pilgrim 06ea4be281 [DAGCombiner] Fix PR25763 - vector comparison constant folding + sign-extension
PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code.

llvm-svn: 255289
2015-12-10 19:47:06 +00:00
Chad Rosier 843c7b4309 [DSE] Disable non-local DSE to see if the bots go green.
I see a few bots timing out, so I'm speculatively disabling r255247.

llvm-svn: 255286
2015-12-10 19:23:02 +00:00
Rafael Espindola a8547d35e9 Fix another case where the linkage was not set.
llvm-svn: 255272
2015-12-10 18:44:26 +00:00
Rong Xu 2611ff8a27 [PGO] Use %t as the temporary profdata filename in the test cases.
Using %t rather %T/<specific_name> as the temporary profdata filename.

llvm-svn: 255271
2015-12-10 18:24:44 +00:00
Pirama Arumuga Nainar 1317d5f311 Fix fptosi, fptoui from f16 vectors to i8, i16 vectors
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.

Add tests for v4f16, v8f16.

Reviewers: ab, jmolloy

Subscribers: llvm-commits, srhines

Differential Revision: http://reviews.llvm.org/D14936

llvm-svn: 255263
2015-12-10 17:16:49 +00:00
Sanjay Patel c83fd9554a [InstCombine] fold bitcasts around an extractelement (3rd try)
This is a redo of r255137 (reverted at r255227) which was a redo of 
r255124 (reverted at r255126) with a fixed check for a scalar source 
type and an added test for the failure that caused the revert.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255261
2015-12-10 17:09:28 +00:00
Dan Gohman 28818d7840 [WebAssembly] Tighten up several CHECK tests.
llvm-svn: 255255
2015-12-10 14:52:34 +00:00
Rafael Espindola caabe22832 Slit lib/Linker in two.
A linker normally has two stages: symbol resolution and "moving stuff".

In lib/Linker there is the complication of lazy linking some globals,
but it was still far more mixed than it needed to.

This splits the linker into a lower level IRMover and the linker proper.
The IRMover just takes a list of globals to move and a callback that
lets the user control what is lazy linked.

The main motivation is that now tools/gold (and soon lld) can use their
own symbol resolution to instruct IRMover what to do.

llvm-svn: 255254
2015-12-10 14:19:35 +00:00
Chad Rosier 533bc3fcac [DeadStoreElimination] Add support for non-local DSE.
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction.  That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.

http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!

llvm-svn: 255247
2015-12-10 13:51:43 +00:00
Nemanja Ivanovic ac8d01add0 Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.

llvm-svn: 255246
2015-12-10 13:35:28 +00:00
Amjad Aboud a9bcf16ebc Macro debug info support in LLVM IR
Introduced DIMacro and DIMacroFile debug info metadata in the LLVM IR to support macros.

Differential Revision: http://reviews.llvm.org/D14687

llvm-svn: 255245
2015-12-10 12:56:35 +00:00
Akira Hatanaka a3c0e8e1ba Revert r255137.
This commit broke apple's internal bot.

llvm-svn: 255227
2015-12-10 08:00:52 +00:00
Dan Gohman f170ba08af [WebAssembly] Implement mixed-type ISD::FCOPYSIGN.
ISD::FCOPYSIGN permits its operands to have differing types, and DAGCombiner
uses this. Add some def : Pat rules to expand this out into an explicit
conversion and a normal copysign operation.

llvm-svn: 255220
2015-12-10 04:55:31 +00:00
Dan Gohman 9341c1d4b3 [WebAssembly] Implement fma.
It is lowered to a libcall for now, but this is expected to change in the future.

llvm-svn: 255219
2015-12-10 04:52:33 +00:00
Tom Stellard c93fc11f36 AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Tom Stellard b3c3bda512 AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints
Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs.

Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15342

llvm-svn: 255203
2015-12-10 02:12:53 +00:00
Dan Gohman 60bddf17c5 [WebAssembly] Fix legalization of f32->f64 EXTLOAD.
llvm-svn: 255202
2015-12-10 02:07:53 +00:00
Dan Gohman a5603b835b [WebAssembly] Also legalize sign_extend_inreg of i32->i64.
llvm-svn: 255191
2015-12-10 01:00:19 +00:00
Dan Gohman dab313e0ed PeepholeOptimizer: Ignore dead implicit defs
Target-specific instructions may have uninteresting physreg clobbers,
for target-specific reasons. The peephole pass doesn't need to concern
itself with such defs, as long as they're implicit and marked as dead.

llvm-svn: 255182
2015-12-10 00:37:51 +00:00
Dan Gohman a8483755d3 [WebAssembly] Fix legalization of shift operators with illegal types.
llvm-svn: 255181
2015-12-10 00:26:26 +00:00
Dan Gohman df00a9ebc2 [WebAssembly] Implement anyext.
llvm-svn: 255179
2015-12-10 00:17:35 +00:00
Quentin Colombet 5d2f7cfd44 [X86] Enable shrink-wrapping by default, but keep it disabled for stack frames
without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.

llvm-svn: 255175
2015-12-09 23:08:18 +00:00
Rafael Espindola ed11bd286f Synchronize the logic for deciding to link a gv.
We were deciding to not link an available_externally gv over a
declaration, but then copying over the body anyway.

llvm-svn: 255169
2015-12-09 22:44:00 +00:00
Rong Xu 7dd9b1ea75 [PGO] Rename the profdata filename to avoid the conflict b/w tests.
Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same
profdata filename which can conflict in current test runs. This patch
renames them to have different names. 

llvm-svn: 255158
2015-12-09 21:27:59 +00:00
Justin Bogner b7389d6714 IR: Make ConstantDataArray::getFP actually return a ConstantDataArray
The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>)
overload has had a typo in it since it was written, where it will
create a Vector instead of an Array. This obviously doesn't work at
all, but it turns out that until r254991 there weren't actually any
callers of this overload. Fix the typo and add some test coverage.

llvm-svn: 255157
2015-12-09 21:21:07 +00:00
Reid Kleckner 54ade23504 [Float2Int] Don't operate on vector instructions
This fixes a crash bug. It's also not clear if we'd want to do this
transform for vectors.

llvm-svn: 255155
2015-12-09 21:08:18 +00:00
Sanjoy Das 9abfb0b429 Use WeakVH to keep track of calls with operand bundles in CloneCodeInfo
`CloneAndPruneIntoFromInst` can DCE instructions after cloning them into
the new function, and so an AssertingVH is too strong.  This change
switches CloneCodeInfo to use a std::vector<WeakVH>.

llvm-svn: 255148
2015-12-09 20:33:52 +00:00
Sanjay Patel b67e6b6044 [InstCombine] fold bitcasts around an extractelement (2nd try)
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255137
2015-12-09 18:57:16 +00:00
Rong Xu f430ae40cf [PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021)
This new patch fixes a few bugs that exposed in last submit. It also improves
the test cases.
--Original Commit Message--
This patch implements a minimum spanning tree (MST) based instrumentation for
PGO. The use of MST guarantees minimum number of CFG edges getting
instrumented. An addition optimization is to instrument the less executed
edges to further reduce the instrumentation overhead. The patch contains both the
instrumentation and the use of the profile to set the branch weights.

Differential Revision: http://reviews.llvm.org/D12781

llvm-svn: 255132
2015-12-09 18:08:16 +00:00
Mehdi Amini 4e2b7c454c Revert "[InstCombine] fold bitcasts around an extractelement"
This reverts commit r255124.

Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255126
2015-12-09 16:31:39 +00:00
Dan Gohman 1cf96c0c34 [WebAssembly] Reintroduce ARGUMENT moving logic
Reinteroduce the code for moving ARGUMENTS back to the top of the basic block.
While the ARGUMENTS physical register prevents sinking and scheduling from
moving them, it does not appear to be sufficient to prevent SelectionDAG from
moving them down in the initial schedule. This patch introduces a patch that
moves them back to the top immediately after SelectionDAG runs.

This is still hopefully a temporary solution. http://reviews.llvm.org/D14750 is
one alternative, though the review has not been favorable, and proposed
alternatives are longer-term and have other downsides.

This fixes the main outstanding -verify-machineinstrs failures, so it adds
-verify-machineinstrs to several tests.

Differential Revision: http://reviews.llvm.org/D15377

llvm-svn: 255125
2015-12-09 16:23:59 +00:00
Sanjay Patel 07410ed234 [InstCombine] fold bitcasts around an extractelement
Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255124
2015-12-09 16:17:20 +00:00
Mehdi Amini b000bbdec2 Change hasUniqueInitializer() to call isStrongDefinitionForLinker() instead of !isWeakForLinker()
Summary:
Available_externally global variable with initializer were considered "hasInitializer()",
while obviously it can't match the description:

    Whether the global variable has an initializer, and any changes made to the
    initializer will turn up in the final executable.

since modifying the initializer of an externally available variable does not make sense.

Reviewers: pcc, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15351

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255123
2015-12-09 16:17:07 +00:00
Tim Northover d91d635b36 ARM: don't use a deleted node as the BaseReg in complex pattern.
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.

Should fix PR25733

llvm-svn: 255120
2015-12-09 15:54:50 +00:00
Robert Lougher f0033b29d4 Fix cycle in selection DAG introduced by extractelement legalization
During selection DAG legalization, extractelement is replaced with a load
instruction.  To do this, a temporary store to the stack is used unless an
existing store is found that can be re-used.
    
If re-using a store, the chain going out of the store must be replaced by
the one going out of the new load (this ensures that any stores that must
take place after the store happens after the load, else the value might
be overwritten before it is loaded).
    
The problem is, if the extractelement index is dependent on the store
replacing the chain will introduce a cycle in the selection DAG (the load
uses the index, and by replacing the chain we will make the index dependent
on the load).
    
To fix this, if the index is dependent on the store, the store is skipped.
This is conservative as we may end up creating an unnecessary extra store
to the stack.  However, the situation is not expected to occur very often.

Differential Revision: http://reviews.llvm.org/D15330

llvm-svn: 255114
2015-12-09 14:34:10 +00:00
Oliver Stannard 86f729296a [AArch64] Fix FP16 vector instructions that should only accept low registers
llvm-svn: 255113
2015-12-09 14:32:11 +00:00
Daniel Sanders 3c7223133d [mips][ias] Range check uimm10 operands
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15229

llvm-svn: 255112
2015-12-09 13:48:05 +00:00
Zlatko Buljan 48f1f39bfe Revert r254897 "[mips][microMIPS] Implement LH, LHE, LHU and LHUE instructions"
Commited patch was intended to implement LH, LHE, LHU and LHUE instructions.
After commit test-suite failed with error message in the form of:
fatal error: error in backend: Cannot select: t124: i32,ch = load<LD2[%d](tbaa=<0x94acc48>), sext from i16> t0, t2, undef:i32
For that reason I decided to revert commit r254897 and make new patch which besides implementation and standard regression tests will also have dedicated tests (CodeGen) for the above error. 

llvm-svn: 255109
2015-12-09 13:07:45 +00:00
Mehdi Amini ceca971576 Revert "Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933"
This reverts commit r255096.

Break the bots: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/16378/

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255101
2015-12-09 08:17:42 +00:00
Vikram TV 0876d2d5cf Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933
llvm-svn: 255096
2015-12-09 05:49:14 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Sanjoy Das b8dced5dfa Don't drop attributes when inlining through "deopt" operand bundles
Test case attached (test case also checks that we don't drop the calling
convention, but that functionality was correct before this patch).

llvm-svn: 255088
2015-12-09 01:01:28 +00:00
Vyacheslav Klochkov a3cd08b05c X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Reviewer: Simon Pilgrim.
Differential Revision: http://reviews.llvm.org/D15317

llvm-svn: 255080
2015-12-09 00:12:13 +00:00
Sanjoy Das 48945cdc15 [OperandBundles] Have PruneEH work correct with operand bundles.
For an invoke with operand bundles, the [op_begin(), op_end()-3] range
can contain things other than invoke arguments.  This change teaches
PruneEH to use arg_begin() and arg_end() explicitly.

llvm-svn: 255073
2015-12-08 23:16:52 +00:00
Pirama Arumuga Nainar e6ccd7b66a Define selection for v4f16, v8f16 scalar_to_vector
Summary:
This fixes failure when trying to select
    insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.

The accompanying v4 and v8 tests fail instruction selection without this
patch.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15322

llvm-svn: 255072
2015-12-08 23:07:06 +00:00
Reid Kleckner 8de1fe23ed [CGP] Reimplement r255055 a different way
llvm-svn: 255070
2015-12-08 23:00:03 +00:00
Reid Kleckner e18f92bfe9 Revert "[CGP] Check that we have an insert point before moving llvm.dbg.value around"
This reverts commit r255055.

Breakage has been reported.

llvm-svn: 255063
2015-12-08 22:33:23 +00:00
Sanjoy Das 8a954a0553 [OperandBundles] Fix a transform in simplifycfg
Reviewers: pcc, majnemer, reames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D15345

llvm-svn: 255062
2015-12-08 22:26:08 +00:00
Simon Pilgrim 323e00d9c7 [X86][AVX] Fold loads + splats into broadcast instructions
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.

This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.

Fix for PR23022

Differential Revision: http://reviews.llvm.org/D15310

llvm-svn: 255061
2015-12-08 22:17:11 +00:00
Reid Kleckner 7c005324d5 [CGP] Check that we have an insert point before moving llvm.dbg.value around
llvm-svn: 255055
2015-12-08 21:50:52 +00:00
Philip Reames 8fc2cbf933 [EarlyCSE] Value forwarding for unordered atomics
This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future.

The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic.

No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch.

Differential Revision: http://reviews.llvm.org/D15337

llvm-svn: 255054
2015-12-08 21:45:41 +00:00
Simon Pilgrim 0aea1b89eb [X86][SSE4A] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse4a-builtins.c

llvm-svn: 255053
2015-12-08 21:43:41 +00:00
Simon Pilgrim 0ca7cb6334 [X86][SSSE3] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/ssse3-builtins.c

llvm-svn: 255052
2015-12-08 21:32:08 +00:00
Simon Pilgrim 9d76810949 [X86][SSE3] Added fast-isel intrinsics tests
As discussed on PR24580, this patch adds fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse3-builtins.c

llvm-svn: 255051
2015-12-08 21:27:19 +00:00
Artyom Skrobov 0a37b80bcb Fix ARMv4T (Thumb1) epilogue generation
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.

Reviewers: jroelofs, qcolombet

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D15126

llvm-svn: 255047
2015-12-08 19:59:01 +00:00
Mehdi Amini bddfbeaf59 Revert "Add Available Externally linkage type to isWeakForLinker()"
This reverts r255043, as per post-review concern were raised on the correctness.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255045
2015-12-08 19:13:31 +00:00
Mehdi Amini 69e3ae8d4b Cleanup test: remove useless alignment
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255044
2015-12-08 19:02:55 +00:00
Mehdi Amini 37c25fa1d1 Add Available Externally linkage type to isWeakForLinker()
Per LangRef: "Globals with available_externally linkage are
allowed to be discarded at will, and are otherwise the same
as linkonce_odr", since linkonce_odr is in this list it makes
sense to have available_externally there as well.

Reviewers: rafael

Differential Revision: http://reviews.llvm.org/D15323

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255043
2015-12-08 19:01:29 +00:00
Tim Northover 614e8ff855 X86: produce more friendly errors during MachO relocation handling
llvm-svn: 255036
2015-12-08 18:31:35 +00:00
Renato Golin 412ee3d45d [ARM] Allowing SP/PC for AND/BIC mod_imm_not
AND/BIC instructions do accept SP/PC, so the register class should be
more generic (rGPR -> GPR) to cope with that case. Adding more tests.

llvm-svn: 255034
2015-12-08 18:10:58 +00:00
Ron Lieberman e6540e244a [Hexagon] Add NewValueJump support for C4_cmpneq, C4_cmplte, C4_cmplteu
llvm-svn: 255027
2015-12-08 16:28:32 +00:00
Daniel Sanders 106d2d4693 [mips][ias] Range check uimm8 operands
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15226

llvm-svn: 255018
2015-12-08 14:42:10 +00:00
Daniel Sanders 59d092f883 [mips][ias] Range check uimm6 operands and fix a bug this revealed.
Summary:
We don't check the size operand on ext/dext*/ins/dins* yet because the
permitted range depends on the pos argument and we can't check that using
this mechanism.

The bug was that dextu/dinsu accepted 0..31 in the pos operand instead of 32..63.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15190

llvm-svn: 255015
2015-12-08 13:49:19 +00:00
Oliver Stannard e4c3d21ea6 [AArch64] Add ARMv8.2-A FP16 vector instructions
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.

The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.

llvm-svn: 255010
2015-12-08 12:16:10 +00:00
Michael Zuckerman a520e9b30c dding test for fnstsw
continue of Wrong FNSTSW size operator
url: http://reviews.llvm.org/D14953


Differential Revision: http://reviews.llvm.org/D15155

llvm-svn: 255007
2015-12-08 12:00:24 +00:00
Justin Bogner 0ebc8605ad IR: Allow vectors of halfs to be ConstantDataVectors
Currently, vectors of halfs end up as ConstantVectors, but there isn't
a good reason they can't be ConstantDataVectors. This should save some
memory.

llvm-svn: 254991
2015-12-08 03:01:16 +00:00
Rafael Espindola 758f7794bb Add a test showing that we internalize lazily linked GVs.
llvm-svn: 254989
2015-12-08 02:38:14 +00:00
Justin Bogner 3135ba9b38 AsmPrinter: Use emitGlobalConstantFP to emit elements of constant data
It's strange to duplicate the logic for emitting FP values into
emitGlobalConstantDataSequential, and it's even stranger that we end
up printing the verbose assembly comments differently between the two
paths. Just call into emitGlobalConstantFP rather than crudely
duplicating its logic.

llvm-svn: 254988
2015-12-08 02:37:48 +00:00
Rafael Espindola a98a3be2c4 Simplify test. NFC.
llvm-svn: 254987
2015-12-08 02:29:45 +00:00
Manman Ren cb8470b4b5 [CXX TLS calling convention] Add support for AArch64.
rdar://9001553

llvm-svn: 254978
2015-12-08 00:14:38 +00:00
Sanjoy Das 683bf070ef [IndVars] Have getInsertPointForUses preserve LCSSA
Summary:
Also add a stricter post-condition for IndVarSimplify.

Fixes PR25578.  Test case by Michael Zolotukhin.

Reviewers: hfinkel, atrick, mzolotukhin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15059

llvm-svn: 254977
2015-12-08 00:13:21 +00:00
Sanjoy Das b771eb6d69 [SCEVExpander] Have hoistIVInc preserve LCSSA
Summary:
(Note: the problematic invocation of hoistIVInc that caused PR24804 came
from IndVarSimplify, not from SCEVExpander itself)

Fixes PR24804.  Test case by David Majnemer.

Reviewers: hfinkel, majnemer, atrick, mzolotukhin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15058

llvm-svn: 254976
2015-12-08 00:13:17 +00:00
NAKAMURA Takumi b4398d8585 Stabilize llvm/test/Object/archive-update.test a bit.
A manipulation (in this case, mkdir) can make slack between creating and touching %t.older/evenlen.

I would make this rewrote with python if this were still unstable.

llvm-svn: 254965
2015-12-07 23:15:57 +00:00
Kit Barton a1c712fae5 [PPC64] Convert bool literals to i32
Convert i1 values to i32 values if they should be allocated in GPRs instead of CRs.

Phabricator: http://reviews.llvm.org/D14064
llvm-svn: 254942
2015-12-07 20:50:29 +00:00
Simon Pilgrim 69aa463780 Fix line endings
llvm-svn: 254939
2015-12-07 20:36:00 +00:00
Ron Lieberman c5e20a41a0 [Hexagon] Adding v60 test, vasr in particular.
llvm-svn: 254923
2015-12-07 18:52:39 +00:00