Commit Graph

36888 Commits

Author SHA1 Message Date
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault 0a30e456b4 AMDGPU: Promote alloca should skip volatiles
llvm-svn: 264214
2016-03-23 23:17:29 +00:00
Matt Arsenault f43c2a0b49 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

llvm-svn: 264200
2016-03-23 21:49:25 +00:00
Cong Hou 94710840fb Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 264199
2016-03-23 21:45:37 +00:00
Sanjay Patel 7876f180b5 [x86] make peekThroughBitcasts() a helper function
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.

It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.

llvm-svn: 264186
2016-03-23 20:16:37 +00:00
Chad Rosier 85c8594056 [AArch64] Replace return 0 with return false. NFC.
llvm-svn: 264185
2016-03-23 20:07:28 +00:00
Kyle Butt 613112826e Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
Artyom Skrobov e6f1b7f094 Replace a string comparison in ARMSubtarget.h with a tablegen entry in ARM.td (NFC)
Reviewers: rengolin, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D18393

llvm-svn: 264165
2016-03-23 16:18:13 +00:00
Oliver Stannard aa77b1e025 [AArch64] Replace some uses of report_fatal_error with reportError in AArch64 ELF object writer
If we can't handle a relocation type, report it as an error in the source,
rather than asserting. I've added a more descriptive message and a test for the
only cases of this that I've been able to trigger.

Differential Revision: http://reviews.llvm.org/D18388

llvm-svn: 264156
2016-03-23 13:45:03 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Hrvoje Varga c45baf212a [mips][microMIPS] Delay slot filler modifications
Differential Revision: http://reviews.llvm.org/D18181

llvm-svn: 264147
2016-03-23 10:29:38 +00:00
Valery Pykhtin c0a77c5064 [AMDGPU] Fix missing assembler predicates.
Differential Revision: http://reviews.llvm.org/D18351

llvm-svn: 264137
2016-03-23 04:27:26 +00:00
Tom Stellard 52ecd2d69b AMDGPU: Cache information about register pressure sets
We can statically decide whether or not a register pressure set is for
SGPRs or VGPRs, so we don't need to re-compute this information in
SIRegisterInfo::getRegPressureSetLimit().

Differential Revision: http://reviews.llvm.org/D14805

llvm-svn: 264126
2016-03-23 01:53:22 +00:00
Joerg Sonnenberger 772bb5b65d Typo
llvm-svn: 264110
2016-03-22 22:24:52 +00:00
Dan Gohman 665d7e3838 [WebAssembly] Implement the rotate instructions.
llvm-svn: 264076
2016-03-22 18:01:49 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Daniel Sanders f3599eb683 [mips] Make simm6 consistent with the rest. NFC.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18147

llvm-svn: 264057
2016-03-22 14:50:22 +00:00
Daniel Sanders 97297770a6 [mips] Range check simm7.
Summary:
Also renamed li_simm7 to li16_imm since it's not a simm7 and has an unusual
encoding (it's a uimm7 except that 0x7f represents -1).

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18145

llvm-svn: 264056
2016-03-22 14:40:00 +00:00
Daniel Sanders 0f17d0da4a [mips] Range check simm5.
Summary:
We can't check the error message for this one because there's another lw/sw
available that covers a larger range. We therefore check the transition
between the two sizes.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18144

llvm-svn: 264054
2016-03-22 14:29:53 +00:00
Daniel Sanders 946dee3b5b [mips] Range check vsplat_uimm[1234568].
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18143

llvm-svn: 264053
2016-03-22 14:17:41 +00:00
Daniel Sanders 93fa4ce9b7 [mips] Range check uimm4_ptr, remove uimm6_ptr, and use correctly sized immediates in MSA copy/insert.
Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18142

llvm-svn: 264052
2016-03-22 13:58:53 +00:00
Nicolai Haehnle 0a33abdfd2 AMDGPU: Fix dangling references introduced by r263982
Fixes Valgrind errors on the test cases that were reported as failing
by buildbots.

llvm-svn: 264000
2016-03-21 22:54:02 +00:00
Duncan P. N. Exon Smith 20be876a64 Fix -Wdocumentation warnings from r263853
Thanks to chapuni for catching this.

llvm-svn: 263993
2016-03-21 22:13:44 +00:00
Nicolai Haehnle a56e6b6a53 AMDGPU: Coding style fixes
I meant to add these before committing r263982 as per the review,
but I forgot to squash.

llvm-svn: 263983
2016-03-21 20:39:24 +00:00
Nicolai Haehnle 213e87f2ee AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Krzysztof Parzyszek b14f4fd0de [Hexagon] Add handling fixups and instruction relaxation
llvm-svn: 263981
2016-03-21 20:27:17 +00:00
Krzysztof Parzyszek c6f1e1a709 [Hexagon] Properly encode registers in duplex instructions
llvm-svn: 263980
2016-03-21 20:13:33 +00:00
Krzysztof Parzyszek 6514a887f4 [Hexagon] Fix reserving emergency spill slots for register scavenger
- R10 and R11 are not reserved registers.
- Check for reserved registers when finding unused caller-saved registers.

llvm-svn: 263977
2016-03-21 19:57:08 +00:00
Dan Gohman c8d7f14506 [WebAssembly] Implement the eqz instructions.
llvm-svn: 263976
2016-03-21 19:54:41 +00:00
Tom Stellard 92339e888f AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

llvm-svn: 263969
2016-03-21 18:56:58 +00:00
Chad Rosier cf173ffb46 [AArch64] Add a helpful assert. NFC.
llvm-svn: 263965
2016-03-21 18:04:10 +00:00
Matt Arsenault cb38a6bd35 AMDGPU: Remove SignBitIsZero for mubuf scratch offsets
These instructions do not have the same negative base
address problem that DS instructions do on SI.

llvm-svn: 263964
2016-03-21 18:02:18 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
Renato Golin 2b6b7ffd6c [ARM] Add Cortex-A32 support
Adding Cortex-A32 as an available target in the ARM backend.

Patch by Sam Parker.

llvm-svn: 263956
2016-03-21 17:29:01 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Chad Rosier 4aeab5fbf2 [AArch64] Fix a -Wdocumentation warning. NFC.
llvm-svn: 263942
2016-03-21 13:43:58 +00:00
Jingyue Wu 1375560bdb [NVPTX] Adds a new address space inference pass.
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).

The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.

Reviewers: jholewinski, tra, jlebar

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17965

llvm-svn: 263916
2016-03-20 20:59:20 +00:00
Simon Pilgrim fcc4532afa [X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements
Based on feedback for D14261

llvm-svn: 263911
2016-03-20 17:43:07 +00:00
Simon Pilgrim c44472a5bc [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger 3ea8af5108 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Michael Kuperstein 048cc3b7a8 Use a range-based for loop. NFC.
llvm-svn: 263889
2016-03-20 00:16:13 +00:00
Manman Ren a3a019cf90 [CXX_FAST_TLS] Fix issues in ARM.
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.

llvm-svn: 263857
2016-03-18 23:44:37 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Manman Ren 2828c57b6f [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.

llvm-svn: 263855
2016-03-18 23:38:49 +00:00
Duncan P. N. Exon Smith c3fa1eded2 AArch64: Don't modify other modules in AArch64PromoteConstant
Avoid modifying other modules in `AArch64PromoteConstant` when the
constant is `ConstantData` (a horrible accident, I'm sure, caught by an
experimental follow-up to r261464).

Previously, this walked through all the users of a constant, but that
reaches into other modules when the constant doesn't depend transitively
on a `GlobalValue`!  Since we're walking instructions anyway, just
modify the instructions we actually see.

As a drive-by, instead of storing `Use` and getting the instructions
again via `Use::getUser()` (which is not a constantant time lookup),
store `std::pair<Instruction, unsigned>`.  Besides being cheaper, this
makes it easier to drop use-lists form `ConstantData` in the future.
(I threw this in because I was touching all the code anyway.)

Because the patch completely changes the traversal logic, it looks
like a rewrite of the pass, but the core logic is all the same (or
should be, minus the out-of-module changes).  In other words, there
should be NFC as long as the LLVMContext only has a single Module.

I didn't think of a good way to test this, but I hope to submit a patch
eventually that makes walking these use-lists illegal/impossible.

llvm-svn: 263853
2016-03-18 23:30:54 +00:00
Alexei Starovoitov 7e453bb8be BPF: emit an error message for unsupported signed division operation
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
2016-03-18 22:02:47 +00:00
Nicolai Haehnle fa771811b3 AMDGPU: add missing braces around multi-line if block
This fixes an issue with rL263658 pointed out by Tom Stellard.

llvm-svn: 263823
2016-03-18 20:32:04 +00:00
Chad Rosier cdfd7e7201 [AArch64] Enable more load clustering in the MI Scheduler.
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing.  I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.

Differential Revision: http://reviews.llvm.org/D18048

llvm-svn: 263819
2016-03-18 19:21:02 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Ehsan Amiri 631ed04af0 adding another optimization opportunity to readme file
llvm-svn: 263775
2016-03-18 04:02:25 +00:00
Adam Nemet 709e3046ee [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

llvm-svn: 263772
2016-03-18 00:27:43 +00:00
Adam Nemet 6d8beeca53 [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

llvm-svn: 263771
2016-03-18 00:27:38 +00:00
Adam Nemet 53e758fc55 [Aarch64] Add pass LoopDataPrefetch for Cyclone
Summary:
This wires up the pass for Cyclone but keeps it off for now because we
need a few more TTIs.

The getPrefetchMinStride value is not very well tuned right now but it
works well with CFP2006/433.milc which motivated this.

Tests will be added as part of the upcoming large-stride prefetching
patch.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, hfinkel, rengolin

Differential Revision: http://reviews.llvm.org/D17943

llvm-svn: 263770
2016-03-18 00:27:29 +00:00
Tim Shen 5cdf75084a [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Tim Northover 498c56c240 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
llvm-svn: 263741
2016-03-17 20:10:28 +00:00
Petar Jovanovic 0b44f24033 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
Derek Schuff d4207ba0f6 [WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.

Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.

Differential Revision: http://reviews.llvm.org/D18234

llvm-svn: 263725
2016-03-17 17:00:29 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Nicolai Haehnle 79cad857a0 AMDGPU: mark atomic instructions as sources of divergence
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18156

llvm-svn: 263719
2016-03-17 16:21:59 +00:00
Simon Pilgrim 0f37fbac51 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Saleem Abdulrasool 071a099102 ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

llvm-svn: 263714
2016-03-17 14:10:49 +00:00
Simon Atanasyan 58ee875296 [mips] Use `formatImm` call to print immediate value in the `MipsInstPrinter`
That allows, for example, to print hex-formatted immediates using
llvm-objdump --print-imm-hex command line option.

Differential Revision: http://reviews.llvm.org/D18195

llvm-svn: 263704
2016-03-17 10:43:36 +00:00
Scott Egerton d65377da78 [mips] Eliminate instances of "potentially uninitialised local variable" warnings, NFC
Summary:
This should eliminate all occurrences of this within LLVMMipsAsmParser.
This patch is in response to http://reviews.llvm.org/D17983. I was unable
to reproduce the warnings on my machine so please advise if this fixes the
warnings.

Reviewers: ariccio, vkalintiris, dsanders

Subscribers: dblaikie, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18087

llvm-svn: 263703
2016-03-17 10:37:51 +00:00
James Y Knight f44fc5219f Tweak some atomics functions in preparation for larger changes; NFC.
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
  '__sync' libcalls and '__atomic' libcalls, and this function is for
  the '__sync' ones.

- getInsertFencesForAtomic() has been replaced with
  shouldInsertFencesForAtomic(Instruction), so that the decision can be
  made per-instruction. This functionality will be used soon.

- emitLeadingFence/emitTrailingFence are no longer called if
  shouldInsertFencesForAtomic returns false, and thus don't need to
  check the condition themselves.

llvm-svn: 263665
2016-03-16 22:12:04 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Colin LeMahieu bb0cdfb9f7 [Hexagon] Adding missing break in switch statement. Extra operands would have been appended to the end.
llvm-svn: 263657
2016-03-16 20:00:38 +00:00
Sanjay Patel be37e62e0c fix function names; NFC
llvm-svn: 263646
2016-03-16 18:00:09 +00:00
Michel Danzer 302f83ac4e AMDGPU: Verify instructions in non-debug builds as well
And emit an error if it fails.

This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.

This is a candidate for the stable branch(es).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
2016-03-16 09:10:42 +00:00
Michel Danzer beb79ceb19 AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
2016-03-16 09:10:35 +00:00
Igor Breger 0ba7b04f5f AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Davide Italiano dfdf278ebf [MC] Rename TLSDESC as it's not ARM specific.
Similarly to what was done for TLSCALL in r263515.

llvm-svn: 263564
2016-03-15 17:29:52 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Douglas Katzman 708eeb0519 Myriad: Add new sparc CPU kinds.
llvm-svn: 263557
2016-03-15 16:41:47 +00:00
Davide Italiano 249c45d92e [MC] Rename TLSCALL as it's not ARM specific.
`MCSymbolRefExpr` variant kind for TLSCALL is prefixed with 
_ARM_ since this is how it was originally implemented.
The X86_64 version is exactly the same so there's no reason
to create a new variant, we can just rename the existing
one to be machine-independent.
This generalization is the first step to implement support
for GNU2 TLS dialect in MC.

Differential Revision:  http://reviews.llvm.org/D18160

llvm-svn: 263515
2016-03-15 00:25:22 +00:00
Eric Christopher da8b3f1914 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Ulrich Weigand 52aa7fba3f [SystemZ] Add missing isBranch flags to certain instruction
Some instructions were missing isBranch, isCall, or isTerminator
flags.  This didn't really affect code generation since most of
the affected patterns were used only for the AsmParser and/or
disassembler.

However, it could affect tools using the MC layer to disassemble
and parse binary code (e.g. via MCInstrDesc::mayAffectControlFlow).

llvm-svn: 263478
2016-03-14 20:16:30 +00:00
Chad Rosier 27c352d26d [AArch64] Refactor AArch64FrameLowering::emitPrologue. NFC.
http://reviews.llvm.org/D18125
Patch by Aditya Kumar.

llvm-svn: 263461
2016-03-14 18:24:34 +00:00
Chad Rosier 6d98655070 [AArch64] Break the dependency between FP and SP when possible.
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.

http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.

llvm-svn: 263458
2016-03-14 18:17:41 +00:00
Chad Rosier 7a21bb196b [Mips] Fix -Wunused-private-field warning after r263444.
llvm-svn: 263454
2016-03-14 18:10:20 +00:00
Sanjay Patel 7506852709 [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Sanjay Patel 62d707c8d9 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Daniel Sanders e8efff373a [mips] MIPS32R6 compact branch support
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.

Patch by Simon Dardis.

Reviewers: vkalintiris, dsanders

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16353

llvm-svn: 263444
2016-03-14 16:24:05 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Nicolai Haehnle 74127fe8d7 AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18101

llvm-svn: 263440
2016-03-14 15:37:18 +00:00
Vasileios Kalintiris 42db3ff47f [mips] Use range-based for loops. NFC.
llvm-svn: 263438
2016-03-14 15:05:30 +00:00
Ulrich Weigand cdce026b4d [SystemZ] Avoid LER on z13 due to partial register dependencies
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).

Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE.  However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.

This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.

llvm-svn: 263431
2016-03-14 13:50:03 +00:00
Zlatko Buljan fba68931ed [mips] Fix an issue with long double when function roundl is defined
Differential Revision: http://reviews.llvm.org/D17760

llvm-svn: 263428
2016-03-14 12:50:23 +00:00
Daniel Sanders 127d2d2b46 [mips] Range check uimm16_64
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17725

llvm-svn: 263427
2016-03-14 12:44:44 +00:00
Daniel Sanders cfa3483c8e [mips] Simplify ordering of range checked immediate classes.
Summary:
With the addition of checks to ensure that operands have a strict ordering
it has become tricky to manage the order in the way I originally intended.

This patch linearizes the ordering which simplifies the implementation but
requires an order that is arbitrary in places. Here are some examples:
* uimm4 < uimm5 < uimm6
* simm4 < uimm4 < simm5 < uimm5
* uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6
  The term 'superset' starts to break down here since the *_plus* classes
  are not true supersets of uimm5 (but they are still subsets of uimm6).
* uimm5 < uimm5_64, and uimm5 < vsplat_uimm5
  This is entirely arbitrary. We need an ordering and what we pick is
  unimportant since only one is possible for a given mnemonic.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17723

llvm-svn: 263423
2016-03-14 11:46:30 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Daniel Sanders 19b7f76afa [mips] Range check uimm6_lsl2.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17291

llvm-svn: 263419
2016-03-14 11:16:56 +00:00
Hans Wennborg 369ebfe4c9 Try to fix build of WebAssemblyRegStackify.cpp on Windows
It's failing to build on VS2015 with:

C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\lib\Target\WebAssembly\WebAssemblyRegStackify.cpp(520):
error C2668: 'llvm::make_reverse_iterator': ambiguous call to overloaded function
C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(217):
note: could be 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
llvm::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(IteratorTy)'
        with
        [
            IteratorTy=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
        ]
C:\b\depot_tools\win_toolchain\vs_files\391bbf1220d3edcd3cc3fccdb56224181e3b13a7\win_sdk\bin\..\..\VC\include\xutility(1217):
note: or 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
std::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(_RanIt)' [found using argument-dependent lookup]
        with
        [
            _RanIt=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
        ]

I don't have VS2015 locally at the moment, but hopefully this will help.

llvm-svn: 263418
2016-03-14 11:04:15 +00:00
Igor Breger a949100532 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Valery Pykhtin 0f97f17152 [AMDGPU] AsmParser: Factor out parseRegister. NFC.
llvm-svn: 263411
2016-03-14 07:43:42 +00:00
Valery Pykhtin 9e33c7f5d3 [AMDGPU] AsmParser: refactor post push_back vector access. NFC.
llvm-svn: 263409
2016-03-14 05:25:44 +00:00
Valery Pykhtin f91911c3ae [AMDGPU] AsmParser: remove redundant isReg checks. NFC.
llvm-svn: 263407
2016-03-14 05:01:45 +00:00
Mehdi Amini ba9fba81d6 Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Simon Pilgrim 035b19ecf5 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Craig Topper 955308fbee [X86] Remove many operands that represent memory stores from outs to ins. These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs.
llvm-svn: 263354
2016-03-13 02:56:31 +00:00
Nemanja Ivanovic bd56e4e25a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Quentin Colombet cf9732b417 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Eric Christopher 35abd051c0 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Simon Pilgrim 33d57c7547 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Ahmed Bougacha 171f7b9986 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

llvm-svn: 263301
2016-03-11 22:02:58 +00:00
Dan Gohman da323e88ea [WebAssembly] Add `final` keywords to a few more subclasses, for consistency.
llvm-svn: 263287
2016-03-11 19:45:37 +00:00
Simon Pilgrim 7b2164ffe0 Fix spelling.
llvm-svn: 263266
2016-03-11 17:31:43 +00:00
Mehdi Amini 99eab3dd06 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Valery Pykhtin a7f480b4e9 [AMDGPU] Fix VOPC instruction operand namings
Differential Revision: http://reviews.llvm.org/D17966

llvm-svn: 263242
2016-03-11 14:53:28 +00:00
Simon Pilgrim 7ca9614c71 [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Vasileios Kalintiris e2cbc21b6f [mips] MIPSR6 Instruction itineraries
Summary: Defines instruction itineraries for common MIPSR6 instructions.

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17198

llvm-svn: 263229
2016-03-11 13:05:06 +00:00
Daniel Sanders 78e8902097 [mips] Range check simm4.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16811

llvm-svn: 263220
2016-03-11 11:37:50 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Chandler Carruth 89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Matt Arsenault bafc9dc591 AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.

llvm-svn: 263206
2016-03-11 08:20:50 +00:00
Matt Arsenault 6b6a2c37bc AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Tim Northover 6092de5075 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

llvm-svn: 263169
2016-03-10 23:02:21 +00:00
Sanjay Patel 0181943b89 [x86] don't use a shuffle when a vselect will do; NFCI
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.

llvm-svn: 263167
2016-03-10 22:35:33 +00:00
Simon Pilgrim 61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Peter Collingbourne aba16fca5d ARM: Support relative references using the PREL31 symbol variant.
Differential Revision: http://reviews.llvm.org/D17937

llvm-svn: 263156
2016-03-10 19:30:18 +00:00
Tim Northover 00e2dcec02 AArch64: remove pseudo-instructions used only for their patterns.
There's no real reason for these pseudos to exist, we should be writing real
patterns even if it is slightly less convenient. NFC.

llvm-svn: 263141
2016-03-10 18:46:12 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Michael Kuperstein 8be8de6d62 [X86] Correctly select registers to pop into for x86_64
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.

This fixes PR26711.

Differential Revision: http://reviews.llvm.org/D18029

llvm-svn: 263139
2016-03-10 18:43:21 +00:00
Balaram Makam e9b2725287 [AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC

Examples:
   and  w8, w8, #0x400
   cbnz w8, L1
 to
   tbnz w8, #10, L1

Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17942

llvm-svn: 263136
2016-03-10 17:54:55 +00:00
Alexandros Lamprineas 843164242e [ARM] Cortex-R8 support
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.

Patch by Pablo Barrio.

Differential Revision: http://reviews.llvm.org/D17925

llvm-svn: 263132
2016-03-10 17:38:41 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Saleem Abdulrasool 1632fe1f77 ARM: follow up improvements for SVN r263118
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct.  The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node.  This node set R7 as clobbered.  However, the
code would then follow up with a clobber of R11.  I had failed to notice the
imp-def,kill on R7 in the isel.  Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if.  Instead, construct a new parallel
WIN node and prefer that when targeting windows.  This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.

llvm-svn: 263123
2016-03-10 16:26:37 +00:00
David L Kreitzer 14f0077f38 Unified the handling of returns in the X87 stackifier so that the stackifier
runs successfully on routines containing IRETs. This fixes PR26410.

Differential Revision: http://reviews.llvm.org/D17643

llvm-svn: 263120
2016-03-10 15:14:02 +00:00
Saleem Abdulrasool 8b30f9854e ARM: correct __builtin_longjmp on WoA
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7.  This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.

llvm-svn: 263118
2016-03-10 15:11:09 +00:00
Elena Demikhovsky cd9967d160 AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)
(failed on instruction selection phase)

Differential Revision: http://reviews.llvm.org/D17924

llvm-svn: 263111
2016-03-10 13:44:22 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Simon Pilgrim 13d4056795 [X86][AVX] Improve target shuffle combining of BLEND+zero
The BLEND+zero combine was failing to combine equivalent BLEND masks.

Follow up to D17483 and D17858

llvm-svn: 263105
2016-03-10 11:50:15 +00:00
Simon Pilgrim 16d11785a5 [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.

This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.

There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.

Differential Revision: http://reviews.llvm.org/D17858

llvm-svn: 263102
2016-03-10 11:23:51 +00:00
Elena Demikhovsky 38f78a2b92 AVX-512: Fixed a bug in shuffle for v64i8 type
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".

Differential Revision: http://reviews.llvm.org/D17994

llvm-svn: 263097
2016-03-10 08:32:09 +00:00
Roman Levenstein 2792b3f02f Add support for a preserve_most calling convention to the AArch64 backend.
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.

There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.

Differential Revision: http://reviews.llvm.org/D18016

llvm-svn: 263092
2016-03-10 04:35:09 +00:00
Sanjay Patel 9f6c4d50b4 [x86] fix cost model inaccuracy for vector memory ops
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.

Differential Revision: http://reviews.llvm.org/D18000

llvm-svn: 263069
2016-03-09 22:23:33 +00:00
Derek Schuff 3e89580571 [WebAssembly] Update known gcc test failures
llvm-svn: 263068
2016-03-09 22:14:33 +00:00
Sanjay Patel 4a8dd89128 [x86, AVX] optimize masked loads with constant masks
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.

Differential Revision: http://reviews.llvm.org/D17899

llvm-svn: 263067
2016-03-09 22:12:08 +00:00
Evandro Menezes 669aaccb89 [AArch64] Minor reformatting (NFC).
llvm-svn: 263054
2016-03-09 19:56:38 +00:00
Chris Dewhurst 52adb575e6 This change adds co-processor condition branching and conditional traps to the Sparc back-end.
This will allow inline assembler code to utilize these features, but no automatic lowering is provided, except for the previously provided @llvm.trap, which lowers to "ta 5".

The change also separates out the different assembly language syntaxes for V8 and V9 Sparc. Previously, only V9 Sparc assembly syntax was provided.

The change also corrects the selection order of trap disassembly, allowing, e.g. "ta %g0 + 15" to be rendered, more readably, as "ta 15", ignoring the %g0 register. This is per the sparc v8 and v9 manuals.

Check-in includes many extra unit tests to check this works correctly on both V8 and V9 Sparc processors.

Code Reviewed at http://reviews.llvm.org/D17960.

llvm-svn: 263044
2016-03-09 18:20:21 +00:00
Kit Barton a1d6a6f1de [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions
This has to be committed before the FE changes

Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
2016-03-09 17:48:01 +00:00
Chad Rosier e4e15ba046 [AArch64] Move helper functions into TII, so they can be reused elsewhere. NFC.
llvm-svn: 263032
2016-03-09 17:29:48 +00:00
Chad Rosier 0da267dd1d [AArch64] Minor cleanup/remove redundant code. NFC.
llvm-svn: 263024
2016-03-09 16:46:48 +00:00
Chad Rosier c27a18f39f [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.
http://reviews.llvm.org/D17967

llvm-svn: 263021
2016-03-09 16:00:35 +00:00
Teresa Johnson e50b23c67f Fix build error due to unsigned compare >= 0 in r263008 (NFC)
Fixes error from building with clang:

/usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12:
error: comparison of unsigned expression >= 0 is always true
[-Werror,-Wtautological-compare]
  if ((Imm >= 0x000) && (Imm <= 0x0ff)) {
         ~~~ ^  ~~~~~

llvm-svn: 263014
2016-03-09 14:58:23 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Nikolay Haustov 9b7577ed22 [AMDGPU] Assembler: Support abs() syntax.
Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|.
Add tests.

Differential Revision: http://reviews.llvm.org/D17887

llvm-svn: 263006
2016-03-09 11:03:21 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Dan Gohman ddfa1a6c18 [WebAssembly] Update comments about irreducible control flow.
llvm-svn: 262995
2016-03-09 04:17:36 +00:00
Dan Gohman d7a2eea619 [WebAssembly] Implement irreducible control flow.
This implements a very simple conservative transformation that doesn't
require more than linear code size growth. There's room for much more
optimization in this space.

llvm-svn: 262982
2016-03-09 02:01:14 +00:00
Quentin Colombet 4340b55593 Revert r262759 and r262760.
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix! 

llvm-svn: 262943
2016-03-08 17:29:11 +00:00
Chad Rosier e40b9513a9 [AArch64] Add MMOs to unscaled pairs.
Test to be committed in follow up commit, per discussion in D17097.
http://reviews.llvm.org/D17097

llvm-svn: 262942
2016-03-08 17:16:38 +00:00
Artyom Skrobov 5ddea6a8e9 [ARM] Simplify ARMInstr*.td by getting rid of identity PatFrags (NFC)
Reviewers: t.p.northover, grosbach, resistor

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17636

llvm-svn: 262936
2016-03-08 16:23:54 +00:00
Hans Wennborg e00b6e7249 Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.

llvm-svn: 262935
2016-03-08 16:21:41 +00:00
Igor Breger 999ac754f2 AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.
Differential Revision: http://reviews.llvm.org/D17953

llvm-svn: 262929
2016-03-08 15:21:25 +00:00
Kit Barton ba532dc816 [Power9] Implement new vsx instructions: load, store instructions for vector and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
2016-03-08 03:49:13 +00:00
Dan Gohman 1402606477 [WebAssembly] Update for spec change from tableswitch to br_table.
Also note that the operand order changed; the default label is now listed
after the regular labels.

llvm-svn: 262903
2016-03-08 03:18:12 +00:00
Quentin Colombet f574ab292b [AArch64] Initialize GlobalISel as part of the target initialization.
llvm-svn: 262897
2016-03-08 01:45:36 +00:00
Richard Smith c2a2830e94 A couple more UB fixes for C++14 sized deallocation.
llvm-svn: 262891
2016-03-08 00:59:44 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Matt Arsenault 56356c8a9c AMDGPU: Remove a fixme for ptrrtoint handling
llvm-svn: 262854
2016-03-07 21:12:46 +00:00
Matt Arsenault 81d06015c6 AMDGPU: Move function only used by R600
llvm-svn: 262853
2016-03-07 21:10:13 +00:00
Marina Yatsina 5f5de9f89b [ms-inline-asm][AVX512] Add ability to use k registers in MS inline asm + fix bag with curly braces
Until now curly braces could only be used in MS inline assembly to mark block start/end.
All curly braces were removed completely at a very early stage.
This approach caused bugs like:
"m{o}v eax, ebx" turned into "mov eax, ebx" without any error.

In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such.
Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such).

This patch fixes the bug described above and enables the use of AVX-512 special operands.

This commit is the the llvm part of the patch.
The clang part of the review is: http://reviews.llvm.org/D17766
The llvm part of the review is: http://reviews.llvm.org/D17767

Differential Revision: http://reviews.llvm.org/D17767

llvm-svn: 262843
2016-03-07 18:11:16 +00:00
Simon Pilgrim 253ca348b2 [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.

This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.

Removed non-constant pool mask decode path as we have no way of testing it right now.

Differential Revision: http://reviews.llvm.org/D17916

llvm-svn: 262809
2016-03-06 21:54:52 +00:00
Valery Pykhtin dc11054f20 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
Engages code from r262804.

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262808
2016-03-06 20:25:36 +00:00
Valery Pykhtin 50cd3c4ec7 fix sanitizer-ppc64be-linux failure for r262804
error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930

llvm-svn: 262805
2016-03-06 15:13:54 +00:00
Valery Pykhtin 499a5c6323 [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262804
2016-03-06 13:27:13 +00:00
Igor Breger 4d94d4d5f7 AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX
Differential Revision: http://reviews.llvm.org/D17913

llvm-svn: 262803
2016-03-06 12:38:58 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Craig Topper 581c0087b9 [X86] Use high bits of return value from getEncoding instead of predicate functions to populate the REX and VEX prefix bits that extend register encodings. NFC
llvm-svn: 262800
2016-03-06 08:12:47 +00:00
Craig Topper faab5c68d4 [X86] Remove unnecessary masking. The assert above it already guaranteed it. NFC
llvm-svn: 262799
2016-03-06 08:12:44 +00:00
Craig Topper 5e038cf589 [X86] Use uint8_t instead of unsigned char as it shortens the code and more explicitly reflects the desired size.
llvm-svn: 262798
2016-03-06 08:12:42 +00:00
Igor Breger f1bd761e00 AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
Differential Revision: http://reviews.llvm.org/D17763

llvm-svn: 262797
2016-03-06 07:46:03 +00:00
Simon Pilgrim 40e1a71cdd [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.

Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization

Followup to D17681

llvm-svn: 262784
2016-03-05 22:53:31 +00:00
Simon Pilgrim aa99331bad [X86] AMD Bobcat CPU (btver1) doesn't support XSAVE
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.

Differential Revision: http://reviews.llvm.org/D17683

llvm-svn: 262782
2016-03-05 22:00:50 +00:00
Quentin Colombet 2a7676b442 [X86] Fix the lowering of setjmp intrinsic on i386.
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.

This fixes PR26742.

llvm-svn: 262762
2016-03-05 00:31:04 +00:00
Quentin Colombet 13b524597d [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.

rdar://problem/24851412

llvm-svn: 262759
2016-03-04 23:29:39 +00:00
David Majnemer 71a1c2c619 Fix build breakage
llvm-svn: 262756
2016-03-04 23:02:15 +00:00
David Majnemer d2f767d2f6 [X86] Support cleaning more than 2**16 bytes of stack
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.

There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.

To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:

pop  %ecx     ; return address is now in %ecx
add  $n, %esp ; clean the stack
push %ecx     ; bring the return address back on the stack
ret           ; pop the return address and jmp to it's value

llvm-svn: 262755
2016-03-04 22:56:17 +00:00
Dan Gohman e6b81362e9 [WebAssembly] Add another possible code-size optimization to README.txt
llvm-svn: 262740
2016-03-04 20:09:57 +00:00
Renato Golin 175c6d6d95 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262738
2016-03-04 19:19:36 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Tom Stellard ebef6f9771 AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().

Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.

This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17591

llvm-svn: 262728
2016-03-04 18:02:01 +00:00
Krzysztof Parzyszek 51155fc0d1 [Hexagon] Fix lowering of calls with the return type of i1
This fixes an assertion in test/CodeGen/Hexagon/ifcvt-edge-weight.ll
when run with -debug-only=isel

llvm-svn: 262726
2016-03-04 17:38:05 +00:00
Zoran Jovanovic a68b67d1ed [mips][microMIPS] Prevent usage of OR16_MMR6 instruction when code for microMIPS is generated.
Author: milena.vujosevic.janicic
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D17373

llvm-svn: 262725
2016-03-04 17:34:31 +00:00
Sam Kolton f51f4b8370 Test commit access
llvm-svn: 262714
2016-03-04 12:29:14 +00:00
Valery Pykhtin 824e804bf6 test commit
llvm-svn: 262709
2016-03-04 10:59:50 +00:00
Benjamin Kramer 4dbf3371bb Make headers self-contained again.
llvm-svn: 262702
2016-03-04 10:49:30 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Simon Pilgrim f33cb61471 [X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test.
PSHUFB decoder was assuming that input was 128 or 256-bit vector only.

llvm-svn: 262661
2016-03-03 21:55:01 +00:00
Simon Pilgrim abcee45b7a [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.

This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.

Differential Revision: http://reviews.llvm.org/D17681

llvm-svn: 262635
2016-03-03 18:13:53 +00:00
Simon Pilgrim 022afe2538 [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.
lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway.

llvm-svn: 262633
2016-03-03 17:54:35 +00:00
Simon Pilgrim 0107d24810 [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.
llvm-svn: 262631
2016-03-03 17:35:43 +00:00
Amjad Aboud 0ce261d052 MCU target has its own ABI, however X86 interrupt handler calling convention overrides this ABI.
Fixed the ordering to check first for X86 interrupt handler then for MCU target.

Differential Revision: http://reviews.llvm.org/D17801

llvm-svn: 262628
2016-03-03 17:17:54 +00:00
Ahmed Bougacha 671795a985 [X86] Don't assume that shuffle non-mask operands starts at #0.
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).

Since:
  r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.

This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.

Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.

One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).

Differential Revision: http://reviews.llvm.org/D17041

llvm-svn: 262627
2016-03-03 16:53:50 +00:00
Sanjay Patel d6cb4ec2a2 [AArch64] fold 'isPositive' vector integer operations (PR26819)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26819

Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that
in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might
come from autovectorization even if it's unlikely to be produced from C NEON intrinsics.

The patch is based on the x86 equivalent:
http://reviews.llvm.org/rL262036

Differential Revision: http://reviews.llvm.org/D17834

llvm-svn: 262623
2016-03-03 15:56:08 +00:00
Igor Breger 639fde79b0 AVX512: Combine AND + TESTM instructions .
Differential Revision: http://reviews.llvm.org/D17844

llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Simon Pilgrim 91dd0a796c [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 262599
2016-03-03 09:43:28 +00:00
Renato Golin 3d78271eac Revert "[ARM] Merging 64-bit divmod lib calls into one"
This reverts commit r262507, which broke some ARM buildbots.

llvm-svn: 262594
2016-03-03 08:57:44 +00:00
Michael Zuckerman c4d054fa4a [LLVM][AVX512] PSRLWI Chnage imm8 to int
Differential Revision: http://reviews.llvm.org/D17753

llvm-svn: 262592
2016-03-03 08:54:05 +00:00
Tom Stellard cc7067a668 AMDGPU: Insert two S_NOP instructions for every high level source statement.
Patch by: Konstantin Zhuravlyov

Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Reviewers: tstellarAMD, arsenm

Subscribers: echristo, dblaikie, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17454

llvm-svn: 262579
2016-03-03 03:53:29 +00:00
Tom Stellard 600ca6fd39 AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17590

llvm-svn: 262577
2016-03-03 03:45:09 +00:00
Hans Wennborg 153e4b0f11 [X86] Enable forwarding bool arguments in tail calls (PR26305)
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.

Differential Revision: http://reviews.llvm.org/D17786

llvm-svn: 262575
2016-03-03 02:06:32 +00:00
Tim Shen 6e676a84ad [PPCVSXFMAMutate] Temporarily disable this pass
llvm-svn: 262573
2016-03-03 01:27:35 +00:00
David Majnemer 1ef654024f [X86] Don't give catch objects a displacement of zero
Catch objects with a displacement of zero do not initialize a catch
object.  The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.

If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.

Address this by creating our catch objects as fixed objects.  We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.

Differential Revision: http://reviews.llvm.org/D17823

llvm-svn: 262546
2016-03-03 00:01:25 +00:00
Matt Arsenault 8226fc4829 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

llvm-svn: 262536
2016-03-02 23:00:21 +00:00
Renato Golin 93e42d9934 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262507
2016-03-02 19:35:45 +00:00
Reid Kleckner 65f9d9cd32 Revert "[X86] Elide references to _chkstk for dynamic allocas"
This reverts commit r262370.

It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404

The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.

llvm-svn: 262505
2016-03-02 19:20:59 +00:00
Matthias Braun f290912d22 ARM: Introduce conservative load/store optimization mode
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.

These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.

So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.

Differential Revision: http://reviews.llvm.org/D17015

llvm-svn: 262504
2016-03-02 19:20:00 +00:00
Geoff Berry 62c1a1e7c7 [AArch64] Enable non-leaf frame pointer elimination.
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.

This change should be NFC when compiling via clang without
-fomit-frame-pointer.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines

Differential Revision: http://reviews.llvm.org/D17730

llvm-svn: 262495
2016-03-02 17:58:31 +00:00
Michael Zuckerman 927fdaee88 [LLVM][AVX512]PSRAWI Change imm8 to int.
Differential Revision: http://reviews.llvm.org/D17705

llvm-svn: 262480
2016-03-02 12:05:07 +00:00
Simon Pilgrim c02b72627a [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.

This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.

It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.

Differential Revision: http://reviews.llvm.org/D17680

llvm-svn: 262478
2016-03-02 11:43:05 +00:00
Nikolay Haustov f2fbabe9c1 Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields"
Build failure with clang.

llvm-svn: 262477
2016-03-02 11:16:56 +00:00
Nikolay Haustov f0f24628cb Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler."
Build failure with clang.

llvm-svn: 262475
2016-03-02 10:54:21 +00:00
Nikolay Haustov 73447a9714 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262474
2016-03-02 10:36:30 +00:00
Nikolay Haustov 6c8c74969a [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.

Benefits:

unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262473
2016-03-02 10:36:25 +00:00
Craig Topper 1d3f4aefd6 [X86] Remove unnecessary call to isReg from emitter's DestMem handling for VEX prefix. The operand is always a register. NFC
llvm-svn: 262468
2016-03-02 07:32:45 +00:00
Craig Topper 6a7cd42213 [X86] Make X86MCCodeEmitter::DetermineREXPrefix locate operands more like how VEX prefix handling does.
llvm-svn: 262467
2016-03-02 07:32:43 +00:00
David Majnemer 5aadde1ecc [X86] Permit reading of the FLAGS register without it being previously defined
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.

Differential Revision: http://reviews.llvm.org/D17782

llvm-svn: 262465
2016-03-02 06:46:52 +00:00
Craig Topper d4dabb3939 [X86] Remove assertion I accidentally left in.
llvm-svn: 262464
2016-03-02 06:35:22 +00:00
Craig Topper a267431fa6 [X86] Be more structured about how we capture the register number when it is encoded in bits 7:4 of the immediate.
For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand.

Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally.

llvm-svn: 262462
2016-03-02 06:06:18 +00:00
Craig Topper cf65c62737 [X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes respectively should reduce size tiny bit. NFC
llvm-svn: 262458
2016-03-02 04:42:31 +00:00
Matt Arsenault f2dcb4737b AMDGPU: Fix bug 26659.
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.

llvm-svn: 262457
2016-03-02 04:12:39 +00:00
Matt Arsenault a266bd8760 AMDGPU: Cleanup suggested in bug 23960
llvm-svn: 262456
2016-03-02 04:05:14 +00:00
Matt Arsenault 5de68cbc4c Bug 20810: Use report_fatal_error instead of unreachable
llvm-svn: 262455
2016-03-02 03:33:55 +00:00
Colin LeMahieu 2d497a0078 [NFC] Convert tabs to spaces.
llvm-svn: 262411
2016-03-01 22:05:03 +00:00
Matthias Braun 37d884fdf4 AArch64: Reenable CompleteModel for A53, A57 and Kryo models
The fixes in r262393 completed them as well.

llvm-svn: 262408
2016-03-01 21:55:35 +00:00
Colin LeMahieu 5cb6eea664 [Hexagon] Modifying r262258 to only be in effect in the hand assembler path, not the integrated assembler.
llvm-svn: 262400
2016-03-01 21:37:41 +00:00
Matthias Braun a6cfb6f682 AArch64: Add missing schedinfo, check completeness for cyclone
This adds some missing generic schedule info definitions, enables
completeness checking for cyclone and fixes a typo uncovered by that.

Differential Revision: http://reviews.llvm.org/D17748

llvm-svn: 262393
2016-03-01 21:20:31 +00:00
Kit Barton e725669483 [Power9] Implement new vector compare, extract, insert instructions
This change implements the following vector operations:

  - Vector Compare Not Equal
    - vcmpneb(.) vcmpneh(.) vcmpnew(.)
    - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.)
  - Vector Extract Unsigned
    - vextractub vextractuh vextractuw vextractd
    - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx
  - Vector Insert
    - vinsertb vinserth vinsertw vinsertd

26 instructions.

Phabricator: http://reviews.llvm.org/D15916
llvm-svn: 262392
2016-03-01 20:51:57 +00:00
Sanjay Patel 18988ae66c [x86] use getBitcast()
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.

llvm-svn: 262391
2016-03-01 20:47:02 +00:00
Geoff Berry a0df341082 Revert "[AArch64] Fix isLegalAddImmediate() to return true for valid negative values."
Revert r262248 in an attempt to fix the clang-native-aarch64-full
    bot and to investigate a performance regression in
    SingleSource/Benchmarks/CoyoteBench/huffbench

llvm-svn: 262388
2016-03-01 20:28:52 +00:00
Vasileios Kalintiris 36901dd1c3 Revert "[mips] Promote the result of SETCC nodes to GPR width."
This reverts commit r262316.

It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.

llvm-svn: 262387
2016-03-01 20:25:43 +00:00
Kit Barton 01cd2e7a4b New file to track implementation status of new POWER9 instructions
llvm-svn: 262386
2016-03-01 20:19:43 +00:00
Matthias Braun 17cb57995e TableGen: Check scheduling models for completeness
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:

- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model

Typical steps necessary to complete a model:

- Ensure all pseudo instructions that are expanded before machine
  scheduling (usually everything handled with EmitYYY() functions in
  XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
  resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.

Differential Revision: http://reviews.llvm.org/D17747

llvm-svn: 262384
2016-03-01 20:03:21 +00:00
Justin Lebar 2daaa1ceca [NVPTX] Annotate param loads/stores as mayLoad/mayStore.
Summary:
Tablegen was unable to determine that param loads/stores were actually
reading or writing from memory.  I think this isn't a problem in
practice for param stores, because those occur in a block right before
we make our call.  But param loads don't have to at the very beginning
of a function, so should be annotated as mayLoad so we don't incorrectly
optimize them.

Reviewers: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17471

llvm-svn: 262381
2016-03-01 19:44:22 +00:00
Justin Lebar 536c8b7446 [NVPTX] Remove workaround for tablegen crash in NVPTXInstrInfo.td.
Summary: Looks like this was caused by a typo.

Reviewers: jholewinski

Subscribers: jholewinski, llvm-commits, tra

Differential Revision: http://reviews.llvm.org/D17357

llvm-svn: 262380
2016-03-01 19:44:20 +00:00
Justin Lebar b5ca00a58d [NVPTX] Use different, convergent MIs for convergent calls.
Summary:
Calls sometimes need to be convergent.  This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.

Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs.  But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.

Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.

Reviewers: jholewinski

Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17423

llvm-svn: 262373
2016-03-01 19:24:03 +00:00
Justin Lebar 93e7a9b91c [NVPTX] Nix hack used to emit '{' and '}' for NVPTX calls.
Summary: Tablegen understands backslash as an escape char; that's sufficient.

Reviewers: jholewinski

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17432

llvm-svn: 262372
2016-03-01 19:24:00 +00:00
Justin Lebar 877f5acc60 [NVPTX] Reformat NVPTXInstrInfo.td, and add additional comments.
Summary:
Also simplify some of the embedded C++ logic.

No functional changes.

Reviewers: jholewinski

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17354

llvm-svn: 262371
2016-03-01 19:23:30 +00:00
David Majnemer 791b88b6da [X86] Elide references to _chkstk for dynamic allocas
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations.  However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.

This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.

Differential Revision: http://reviews.llvm.org/D17679

llvm-svn: 262370
2016-03-01 19:20:23 +00:00
Sanjay Patel 8a37a988e6 fix function names; NFC
llvm-svn: 262367
2016-03-01 19:14:09 +00:00
Matt Arsenault 03dac8d8e4 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Changpeng Fang 24f035af32 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Michael Zuckerman 433b241570 [LLVM][AVX512] PSRL{DI|QI} Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17713

llvm-svn: 262353
2016-03-01 17:46:32 +00:00
Hans Wennborg e64cf9dddb [X86] Check that attribute parameters match for tail calls (PR26590)
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.

  void g(signed short);
  void f(unsigned short x) {
    g(x);
  }

llvm-svn: 262352
2016-03-01 17:45:23 +00:00
Sanjay Patel 2ca144f14c fix documentation comments; NFC
llvm-svn: 262351
2016-03-01 17:25:35 +00:00
Sanjay Patel 9fea531fec function names start with a lowercase letter; NFC
llvm-svn: 262347
2016-03-01 16:17:48 +00:00
Nikolay Haustov e309e1415d [AMDGPU] Remove unused disassembler code.
llvm-svn: 262346
2016-03-01 16:02:40 +00:00
Nikolay Haustov 47a115cd41 [AMDGPU] Fix build warnings.
llvm-svn: 262338
2016-03-01 14:50:59 +00:00
Nikolay Haustov ac106add0f [AMDGPU] Disassembler code refactored + error messages.
Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out.

64bit instructions decoding added.

Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code.

The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel.

Previous TODOs were saved whenever possible.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17720

llvm-svn: 262332
2016-03-01 13:57:29 +00:00
Michael Zuckerman 7878888690 [AVX512][PSRAQ][PSRAD] Change imm8 to int.
Differential Revision: http://reviews.llvm.org/D17692

llvm-svn: 262320
2016-03-01 11:36:23 +00:00
Amjad Aboud 719325fe11 Disallow generating vzeroupper before return instruction (iret) in interrupt handler function.
This resolves https://llvm.org/bugs/show_bug.cgi?id=26412

Differential Revision: http://reviews.llvm.org/D17542

llvm-svn: 262319
2016-03-01 11:32:03 +00:00
Vasileios Kalintiris 3a8f7f9e31 [mips] Promote the result of SETCC nodes to GPR width.
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.

The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D10970

llvm-svn: 262316
2016-03-01 10:08:01 +00:00
Nikolay Haustov ea8febde04 [TableGen] AsmMatcher: Skip optional operands in the midle of instruction if it is not present
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:

string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).

Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17568

[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:

clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
2016-03-01 08:34:43 +00:00
Craig Topper 073e947f1b [X86] Centralize the masking of TSFlags with FormMask into a variable earlier so we can stop masking in multiple places. NFC
llvm-svn: 262312
2016-03-01 07:15:59 +00:00
Craig Topper 5c8dc5f064 [X86] Localize a temporary variable into the cases its need in. NFC
llvm-svn: 262310
2016-03-01 06:42:48 +00:00
Craig Topper b8c29b4ae9 [X86] Be consistent about using pre/post increment/decrement in nearby code. NFC
llvm-svn: 262309
2016-03-01 06:42:46 +00:00
Craig Topper d40a55064f [X86] Combine some initialization code with variable declaration and comments. NFC
llvm-svn: 262301
2016-03-01 05:42:16 +00:00
Matt Arsenault d275fcabcb AMDGPU: Don't emit build_pair during udivrem legalization
Technically you aren't supposed to emit these after type legalization
for some reason, and we use vector extracts of bitcasted integers
as the canonical way to do this.

llvm-svn: 262298
2016-03-01 05:06:05 +00:00
Matt Arsenault f4dfc1a027 AMDGPU: Don't use estimated stack size when we know the real stack size
llvm-svn: 262297
2016-03-01 04:58:20 +00:00
Matt Arsenault 59b8b77405 AMDGPU: Set HasExtractBitInsn
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.

But in most situations we do want the sinking to occur.

llvm-svn: 262296
2016-03-01 04:58:17 +00:00
Eric Christopher 114fa1c3f6 Simplify some boolean conditional return statements in AArch64.
http://reviews.llvm.org/D9979

Patch by Richard Thomson (and some conflict resolution by me).

llvm-svn: 262266
2016-02-29 22:50:49 +00:00
Colin LeMahieu ab9eca4d9f [Hexagon] As a size optimization, not lazy extending TPREL or DTPREL variants since they're usually in range.
llvm-svn: 262258
2016-02-29 21:21:56 +00:00
Colin LeMahieu 9e5a9c32db [Hexagon] Missed member initialization causing ubsan failure.
llvm-svn: 262252
2016-02-29 20:42:25 +00:00
Geoff Berry f5ba61d18c [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.
Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D17463

llvm-svn: 262248
2016-02-29 19:53:22 +00:00
Ahmed Bougacha bb5d7d7ed8 [X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI.
This is long-standing dirtiness, as acknowledged by r77582:

    The current trick is to select it into a merge_values with
    the first definition being an implicit_def. The proper solution is
    to add new ISD opcodes for the no-output variant.

Doing this before selection will let us combine away some constructs.

Differential Revision: http://reviews.llvm.org/D17659

llvm-svn: 262244
2016-02-29 19:28:07 +00:00
Colin LeMahieu b9f1eae328 [Hexagon] Setting sign mismatch flag on expression instead of using bit tricks.
llvm-svn: 262243
2016-02-29 19:17:56 +00:00
David Majnemer e60ee3b8ce [WinEH] Make setjmp work correctly with EH
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit.  A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.

Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.

MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs.  This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.

Differential Revision: http://reviews.llvm.org/D17721

llvm-svn: 262241
2016-02-29 19:16:03 +00:00
Colin LeMahieu 73cd686ce1 [Hexagon] Using MustExtend flag on expression instead of passing around bools.
llvm-svn: 262238
2016-02-29 18:39:51 +00:00
Nemanja Ivanovic 1a5706ca1b Fix for PR26180
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592

This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.

llvm-svn: 262233
2016-02-29 16:42:27 +00:00
Daniel Sanders 03a8d2f8ec [mips] Range check uimm20 and fixed a bug this revealed.
Summary:
The bug was that dextu's operand 3 would print 0-31 instead of 32-63 when
printing assembly. This came up when replacing
MipsInstPrinter::printUnsignedImm() with a version that could handle arbitrary
bit widths.

MipsAsmPrinter::printUnsignedImm*() don't seem to be used so they have been
removed.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15521

llvm-svn: 262231
2016-02-29 16:06:38 +00:00
Vasileios Kalintiris 29620aca3e [mips] Do not use SLL for ANY_EXTEND nodes as the high bits are undefined.
Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15420

llvm-svn: 262230
2016-02-29 15:58:12 +00:00
Daniel Sanders 611eb82953 [mips] Make isel select the correct DEXT variant up front.
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.

Now isel selects the correct variant up front.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16810

llvm-svn: 262229
2016-02-29 15:26:54 +00:00
Daniel Sanders 90f0d0b8e3 [mips] Make symbols an acceptable branch target when expanding compare-to-immediate-and-branch macros.
Reviewers: vkalintiris

Subscribers: llvm-commits, vkalintiris, dim, seanbruno, dsanders

Differential Revision: http://reviews.llvm.org/D15369

llvm-svn: 262213
2016-02-29 11:24:49 +00:00
Vasileios Kalintiris c9aaa3171d [mips] Remove unused function declarations from MipsRegisterInfo.h. NFC.
llvm-svn: 262187
2016-02-28 16:55:28 +00:00
JF Bastien 1afd1e2baa WebAssembly: fix build
More API churn, experimental target got sad.

llvm-svn: 262179
2016-02-28 15:33:53 +00:00
Michael Zuckerman 96836fc81c [AVX512][PSLLW ][PSLLV] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17684

llvm-svn: 262176
2016-02-28 07:32:10 +00:00
Matt Arsenault 3a61985b2f AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Duncan P. N. Exon Smith be8f8c4478 CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFC
These parameters aren't expected to be null, so take them by reference.

llvm-svn: 262151
2016-02-27 20:14:29 +00:00
Duncan P. N. Exon Smith fd8cc23220 CodeGen: Change MachineInstr to use MachineInstr&, NFC
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null.  Slowly inching
toward being able to fix PR26753.

llvm-svn: 262149
2016-02-27 20:01:33 +00:00
Duncan P. N. Exon Smith 1f6624ae34 AArch64: Use MachineInstr& in guaranteesZeroRegInBlock(), NFC
llvm-svn: 262143
2016-02-27 19:12:54 +00:00
Duncan P. N. Exon Smith 5702287809 CodeGen: Update DFAPacketizer API to take MachineInstr&, NFC
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*.  In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator.  Besides cleaning up the API, this is in
search of PR26753.

llvm-svn: 262142
2016-02-27 19:09:00 +00:00
Duncan P. N. Exon Smith f9ab416d70 WIP: CodeGen: Use MachineInstr& in MachineInstrBundle.h, NFC
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null.  Besides
being a nice cleanup, this is tacking toward a fix for PR26753.

llvm-svn: 262141
2016-02-27 17:05:33 +00:00
JF Bastien 13d3b9b777 WebAssembly: fix build
It was broken by the work for PR26753.

llvm-svn: 262140
2016-02-27 16:38:23 +00:00
Simon Pilgrim 9e10b1655c Tidyup for loops - don't repeat upper limit evaluation if you don't have to. NFCI.
llvm-svn: 262137
2016-02-27 13:26:58 +00:00
Chris Dewhurst 053826af69 The patch adds missing registers and instructions to complete all the registers supported by the Sparc v8 manual.
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.

Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.

llvm-svn: 262133
2016-02-27 12:49:59 +00:00
Simon Pilgrim 3b42ca0760 Strip trailing whitespace. NFCI.
llvm-svn: 262131
2016-02-27 11:49:16 +00:00
Matt Arsenault 9d82ee7526 AMDGPU: Split vi-insts subtarget feature
This will be more useful for marking builtins acceptable for which
subtargets.

llvm-svn: 262121
2016-02-27 08:53:55 +00:00
Matt Arsenault 274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Duncan P. N. Exon Smith 3ac9cc6156 CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals.  The MachineInstrs here are
never null, so this cleans up the API a bit.  It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).

At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.

llvm-svn: 262115
2016-02-27 06:40:41 +00:00
Ahmed Bougacha ffcab7bf32 [X86] Fix a stale comment. NFC.
llvm-svn: 262087
2016-02-26 22:59:57 +00:00
Ahmed Bougacha 55e1592889 [X86] Remove the unused SDTX86atomicBinary. NFC.
llvm-svn: 262086
2016-02-26 22:59:41 +00:00
Simon Pilgrim 4d1a088323 Strip trailing whitespace. NFCI.
llvm-svn: 262083
2016-02-26 22:28:50 +00:00