Commit Graph

34469 Commits

Author SHA1 Message Date
Sanjay Patel 7c912898a5 [x86] enable machine combiner reassociations for scalar 'and' insts
llvm-svn: 246300
2015-08-28 14:09:48 +00:00
Ahmed Bougacha f9c19da03a [CodeGen] Support (and default to) expanding READCYCLECOUNTER to 0.
For targets that didn't support this, this will let us respect the
langref instead of failing to select.

Note that we don't need to change the 32-bit x86/PPC lowerings (to
account for the result type/# difference) because they're both
custom and bypass type legalization.

llvm-svn: 246258
2015-08-28 01:49:59 +00:00
Quentin Colombet fa4ecb4b9a [AArch64][CollectLOH] Fix a regression that prevented us to detect chains of
more than 2 instructions.

I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!

Fix that as well!

llvm-svn: 246239
2015-08-27 23:47:10 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Hal Finkel 7ffe55ae9d [PowerPC] Remove unnecessary braces in PPCVSXFMAMutate
Address Eric's post-commit review of r245741. NFC.

llvm-svn: 246121
2015-08-26 23:41:53 +00:00
Bjarke Hammersholt Roune 6c64738e87 [NVPTX] Let NVPTX backend detect integer min and max patterns.
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.


Reviewers: jholewinski, meheff, jingyue

Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D12377

llvm-svn: 246107
2015-08-26 23:22:02 +00:00
Cong Hou b5ef475e5c [ARM] Use BranchProbability::scale() to scale an integer with a probability in ARMBaseInstrInfo.cpp,
Previously in isProfitableToIfCvt() in ARMBaseInstrInfo.cpp, the multiplication between an integer and a branch probability is done manually in an unsafe way that may lead to overflow. This patch corrects those cases by using BranchProbability's member function scale() to avoid overflow (which stores the intermediate result in int64).

Differential Revision: http://reviews.llvm.org/D12295

llvm-svn: 246106
2015-08-26 23:17:52 +00:00
JF Bastien b1b61ebb21 WebAssembly: NFC comment update
llvm-svn: 246101
2015-08-26 23:03:07 +00:00
JF Bastien 45479f627a WebAssembly: handle private/internal globals.
Things of note:
 - Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
 - Special LLVM globals are either ignored, or error out for now.
 - TLS isn't supported yet (WebAssembly will have threads later).
 - There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
 - Undef is convereted to whatever the type's appropriate null value is.
 - assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.

llvm-svn: 246092
2015-08-26 22:09:54 +00:00
Reid Kleckner c2b9254426 [ms-inline-asm] Relax assertion around funky identifiers slightly
A corresponding clang change will make it so that clang can consume part
of an assembler token. The assembler treats '.' as an identifier
character while clang does not, so it's view of the token stream is a
little different.

llvm-svn: 246089
2015-08-26 21:57:25 +00:00
Mehdi Amini 0ab4b5b52e Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246082
2015-08-26 21:16:29 +00:00
Matt Arsenault 8a067121f8 AMDGPU: Delete dead code
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.

The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.

Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.

llvm-svn: 246080
2015-08-26 20:48:08 +00:00
Matt Arsenault 5e7f95e567 AMDGPU: Don't reprocess instructions when splitting i64 bcnt
llvm-svn: 246079
2015-08-26 20:48:04 +00:00
Matt Arsenault 445833cc91 AMDGPU: Fix not moving users of s_bfe_i64 to VALU
This wouldn't propagate to users of the original BFE
and would hit a verifier error.

llvm-svn: 246078
2015-08-26 20:47:58 +00:00
Matt Arsenault f003c38e1e AMDGPU: Don't create intermediate SALU instructions
When splitting 64-bit operations, create the correct
VALU instructions immediately.

This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.

llvm-svn: 246077
2015-08-26 20:47:50 +00:00
Andrew Kaylor af083d4cf9 Expose hasLiveCondCodeDef as a member function of the X86InstrInfo class. NFC
This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC.

Patch by: Kevin B. Smith
Differential Revision: http://reviews.llvm.org/D12371

llvm-svn: 246073
2015-08-26 20:36:52 +00:00
Mehdi Amini 31ebf03c09 Revert "Fix LLVM C API for DataLayout"
This reverts commit r246052.
Third attempt, still unpleasant for some bots.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246057
2015-08-26 19:24:59 +00:00
Matt Arsenault 602a16d3db AMDGPU/SI: Report SIFixSGPRLiveRanges changed function
llvm-svn: 246056
2015-08-26 19:12:03 +00:00
Mehdi Amini 9d692b6805 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246052
2015-08-26 18:56:01 +00:00
Matt Arsenault bd66061db7 AMDGPU: Make sure to reserve super registers
I think this could potentially have broken if
one of the super registers were allocated
that contain v254/v255.

llvm-svn: 246051
2015-08-26 18:54:50 +00:00
Mehdi Amini 8b3dda3f71 Revert "Fix LLVM C API for DataLayout"
This reverts commit r246044.
Build broken, still. It builds for me...

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246049
2015-08-26 18:37:59 +00:00
Matt Arsenault 19c5488015 AMDGPU: Produce error on dynamic_stackalloc
llvm-svn: 246048
2015-08-26 18:37:13 +00:00
Mehdi Amini b5d8b27fc8 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246044
2015-08-26 18:22:34 +00:00
James Y Knight 3602286937 [SPARC] Fix stupid oversight in stack realignment support.
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.

And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.

This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.

Differential Revision: http://reviews.llvm.org/D12349

llvm-svn: 246042
2015-08-26 17:57:51 +00:00
Vedant Kumar bf891b12b4 [llvm-mc] Ignore opcode size prefix in 64-bit CALL disassembly
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.

The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.

Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:

    Indicates an instruction syntax that requires an address override
    prefix in 64-bit mode and is not supported. Using an address
    override prefix in 64-bit mode may result in model-specific
    execution behavior. (Vol. 2A 3-7)

Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.

Patch by Matthew Barney!

Differential Revision: http://reviews.llvm.org/D9573

llvm-svn: 246038
2015-08-26 16:20:29 +00:00
Chad Rosier 9f4709b261 [AArch64] Remove a use-after-free when collecting stats.
The call to mergePairedInsns() deletes MI, so the later use by isUnscaledLdSt()
is referencing freed memory.

llvm-svn: 246033
2015-08-26 13:39:48 +00:00
Silviu Baranga db1ddb32ce [AArch64] Unify the integer min/max vector selection patterns with the intrinsic ones
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.

There should already be testing in place for this, so no further tests
were added.

Reviewers: jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12276

llvm-svn: 246030
2015-08-26 11:11:14 +00:00
Matthias Braun ccfc9c8d6d FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel
Note that after this change branch probabilities are preserved now.

llvm-svn: 245998
2015-08-26 01:55:47 +00:00
Matthias Braun 17af607796 FastISel: Factor out common code; NFC intended
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.

llvm-svn: 245997
2015-08-26 01:38:00 +00:00
JF Bastien 1a4aa1589b WebAssembly: add small FIXME for AsmPrinter.
Suggested by @sunfish as a follow-up to r245982.

llvm-svn: 245996
2015-08-26 00:50:49 +00:00
Charles Davis 119525914c Make variable argument intrinsics behave correctly in a Win64 CC function.
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.

Reviewers: nadav, asl, eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1622

llvm-svn: 245990
2015-08-25 23:27:41 +00:00
JF Bastien 54be3b1f03 WebAssembly: assert that there aren't any constant pools
WebAssembly will either use globals or immediates, since it's a virtual ISA.

llvm-svn: 245989
2015-08-25 23:19:49 +00:00
JF Bastien b6091dfe0f WebAssembly: emit `(func (param t) (result t))` s-expressions
Summary: Match spec format: https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wasm

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D12307

llvm-svn: 245986
2015-08-25 22:58:05 +00:00
JF Bastien 289287060b WebAssembly: comment out .globl when printing textual assembly
Do the same for .weak (not implemented for now, but may as well to it). Update comment string to two semicolons.

llvm-svn: 245982
2015-08-25 22:23:15 +00:00
Sanjay Patel deb8f826a5 make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288

llvm-svn: 245950
2015-08-25 16:29:21 +00:00
Michael Kuperstein 6e3fee07f7 [X86] Remove references to _ftol2
As of r245924, _ftol2 is no longer used for fptoui on MS platforms.
Remove the dead code associated with it.

llvm-svn: 245925
2015-08-25 07:58:33 +00:00
Michael Kuperstein 8515893be8 [X86] Fix fptoui conversions
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316

llvm-svn: 245924
2015-08-25 07:42:09 +00:00
Steve King 5cdbd20cc3 Pass function attributes instead of boolean in isIntDivCheap().
llvm-svn: 245921
2015-08-25 02:31:21 +00:00
Mehdi Amini f83b865448 Revert "Fix LLVM C API for DataLayout"
This reverts commit 433bfd94e4b7e3cc3f8b08f8513ce47817941b0c.
Broke some bot, have to see why it passed locally.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245917
2015-08-25 01:21:09 +00:00
Mehdi Amini 84b2e325d3 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245916
2015-08-25 01:07:25 +00:00
Hal Finkel 0f2ddcb83f [PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.

llvm-svn: 245907
2015-08-24 23:48:28 +00:00
Matthias Braun b2b7ef1de8 MachineBasicBlock: Add liveins() method returning an iterator_range
llvm-svn: 245895
2015-08-24 22:59:52 +00:00
Dan Gohman 2683a5534e [WebAssembly] DYNAMIC_STACKALLOC returns a pointer.
llvm-svn: 245893
2015-08-24 22:31:52 +00:00
JF Bastien af111db8af WebAssembly: Implement call
Summary: Support function calls.

Reviewers: sunfish, sunfishcode

Subscribers: sunfishcode, jfb, llvm-commits

Differential revision: http://reviews.llvm.org/D12219

llvm-svn: 245887
2015-08-24 22:16:48 +00:00
JF Bastien 19c2e6634d Revert two bad commits.
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D12298

llvm-svn: 245886
2015-08-24 22:07:33 +00:00
JF Bastien 744ad106c3 Missing print.
llvm-svn: 245883
2015-08-24 22:00:04 +00:00
JF Bastien d8a9d66d50 call
llvm-svn: 245882
2015-08-24 21:59:51 +00:00
Dan Gohman 12e1997e4b [WebAssembly] Make the assembly printer indent instructions.
llvm-svn: 245875
2015-08-24 21:19:48 +00:00
Dan Gohman 69c4c76396 [WebAssembly] CodeGen support for __builtin_wasm_page_size()
llvm-svn: 245872
2015-08-24 21:03:24 +00:00
Bill Schmidt 32fd189de2 [PPC64LE] Fix PR24546 - Swap optimization and debug values
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass.  The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation.  I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.

llvm-svn: 245862
2015-08-24 19:27:27 +00:00
Dan Gohman 7b63484b99 [WebAssembly] Skeleton FastISel support
llvm-svn: 245860
2015-08-24 18:44:37 +00:00
Dan Gohman 896e53fae8 [WebAssembly] Implement floating point rounding operators.
llvm-svn: 245859
2015-08-24 18:23:13 +00:00
Dan Gohman 01612f627d [WebAssembly] Tell TargetTransformInfo about popcnt and sqrt.
llvm-svn: 245853
2015-08-24 16:51:46 +00:00
Dan Gohman e419a7c307 [WebAssembly] Use the checked form of MachineFunction::getSubtarget. NFC.
llvm-svn: 245852
2015-08-24 16:46:31 +00:00
Dan Gohman 08fc966d3c [WebAssembly] Implement the is_zero_undef forms of cttz and ctlz
llvm-svn: 245851
2015-08-24 16:39:37 +00:00
Michael Zuckerman 9beca2e7e2 [X86] Add support for mmword memory operand size for Intel-syntax x86 assembly
Differential Revision: http://reviews.llvm.org/D12151

llvm-svn: 245835
2015-08-24 10:26:54 +00:00
Scott Douglass bdef60462d [ARM] Use AEABI helpers for i64 div and rem
Differential Revision: http://reviews.llvm.org/D12232

llvm-svn: 245830
2015-08-24 09:17:18 +00:00
Scott Douglass d2974a6afa [ARM] Refactor LowerDivRem before adding LowerREM (nfc)
Differential Revision: http://reviews.llvm.org/D12230

llvm-svn: 245829
2015-08-24 09:17:11 +00:00
Michael Zuckerman 2fe19db94f first commit to llvm
llvm-svn: 245825
2015-08-24 07:48:50 +00:00
Mehdi Amini a758398833 Add missing break in AArch64DAGToDAGISel::Select() switch case
Reported by coverity.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245800
2015-08-23 00:42:57 +00:00
Jingyue Wu fcec09866a [NVPTX] Allow undef value as global initializer
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.

Test Plan: test/CodeGen/NVPTX/global-addrspace.ll

Patch by Xuetian Weng

Reviewers: jholewinski, tra, jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D12242

llvm-svn: 245785
2015-08-22 05:40:26 +00:00
Matt Arsenault 0a3ac1be43 AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM
Although the basic s_load_* instructions happen to use the same
opcode, some of the special case SMRD instructions have
different opcodes.

llvm-svn: 245775
2015-08-22 00:54:31 +00:00
Matt Arsenault e8df879948 AMDGPU: Improve accuracy of instruction rates for some FP instructions
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Matt Arsenault 33010103b7 AMDGPU: Use DFS to avoid second loop over function
llvm-svn: 245772
2015-08-22 00:43:38 +00:00
Matt Arsenault c8d8e4ed76 AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges
llvm-svn: 245769
2015-08-22 00:19:34 +00:00
Matt Arsenault aba29d6ab1 AMDGPU: Improve debug printing in SIFixSGPRLiveRanges
llvm-svn: 245768
2015-08-22 00:19:25 +00:00
Matt Arsenault 6adf07a92e AMDGPU: Move CI instructions into CIInstructions.td
There are still a couple of CI patterns left in SIInstructions.

llvm-svn: 245767
2015-08-22 00:16:34 +00:00
Matt Arsenault f56872dc30 AMDGPU: Minor cleanups to help with f16 support
The main change is inverting the condition for the
operand class classes so that VT.Size == 16 uses VGPR_32
instead of 64.

llvm-svn: 245764
2015-08-21 23:49:51 +00:00
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
Vedant Kumar 366dd9fd2b [ARM] Fix MachO CPU Subtype selection
Differential Revision: http://reviews.llvm.org/D12040

llvm-svn: 245744
2015-08-21 21:52:48 +00:00
Hal Finkel ff9639d6b7 [PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).

Fixes the test case from PR24542 (although that may have been over-reduced).

llvm-svn: 245741
2015-08-21 21:34:24 +00:00
Sanjay Patel f0bc07f7a5 [x86] enable machine combiner reassociations for 256-bit vector min/max
llvm-svn: 245735
2015-08-21 21:04:21 +00:00
Sanjay Patel dddad10241 remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.

llvm-svn: 245733
2015-08-21 20:39:17 +00:00
Sanjay Patel 9e916dc48d [x86] invert logic for attribute 'FeatureFastUAMem'
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.

Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.

Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.

Differential Revision: http://reviews.llvm.org/D12154

llvm-svn: 245729
2015-08-21 20:17:26 +00:00
Sanjay Patel cf942fa905 [x86] enable machine combiner reassociations for 128-bit vector min/max
llvm-svn: 245715
2015-08-21 18:06:49 +00:00
Eric Christopher e5e302f7e0 Fix typo - symetric -> symmetric.
llvm-svn: 245705
2015-08-21 16:23:39 +00:00
James Y Knight 667395f334 [Sparc] Support user-specified stack object overalignment.
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.

This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.

[Begin long-winded screed]

Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.

However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.

Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.

Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.

(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)

Differential Revision: http://reviews.llvm.org/D12208

llvm-svn: 245668
2015-08-21 04:17:56 +00:00
NAKAMURA Takumi cf61aae163 SparcAsmParser.cpp: Appease msc x86.
llvm-svn: 245661
2015-08-21 01:12:19 +00:00
Matthias Braun 46e5639806 AArch64: Fix cmp;ccmp ordering
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.

This fixes http://llvm.org/PR24459

llvm-svn: 245641
2015-08-20 23:33:34 +00:00
Matthias Braun 266204b7dc AArch64: Do not create CCMP on multiple users.
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.

llvm-svn: 245640
2015-08-20 23:33:31 +00:00
Dan Gohman 32907a6b21 [WebAssembly] Mark more operators as Expand.
llvm-svn: 245636
2015-08-20 22:57:13 +00:00
Ahmed Bougacha 0cdc7719f0 [X86] Look for scalar through one bitcast when lowering to VBROADCAST.
Fixes PR23464: one way to use the broadcast intrinsics is:

  _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));

We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.

Differential Revision: http://reviews.llvm.org/D10557

llvm-svn: 245613
2015-08-20 21:02:39 +00:00
Jingyue Wu ca3ef11a9b [NVPTX] truncating 64-bit to 32-bit is free
Summary:
Add an LSR test that exercises isTruncateFree. Without this change, LSR creates
another indvar representing the truncated value.

Reviewers: jholewinski, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D12058

llvm-svn: 245611
2015-08-20 20:59:02 +00:00
Ahmed Bougacha 1a498705e4 [X86] Replace avx2 broadcast intrinsics with native IR.
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.

Differential Revision: http://reviews.llvm.org/D10555

llvm-svn: 245606
2015-08-20 20:36:19 +00:00
James Molloy bf17009a97 [ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.

Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.

llvm-svn: 245577
2015-08-20 16:33:44 +00:00
Douglas Katzman 58195a2d74 [Sparc]: correct the 'set' synthetic instruction
Differential Revision: http://reviews.llvm.org/D12194

llvm-svn: 245575
2015-08-20 16:16:16 +00:00
Marina Yatsina bce1ab67a5 [X86] Fix FBLD and FBSTP
FBLD and FBSTP should receive TBYTE because it is defined as
FBLD m80
FBSTP m80

Differential Revision: http://reviews.llvm.org/D11748

llvm-svn: 245553
2015-08-20 11:51:24 +00:00
Marina Yatsina 7a4e1ba737 [X86] Fix bug in COMISD and COMISS definition in td files
COMISD should receive QWORD because it is defined as
 (V)COMISD xmm1, xmm2/m64

COMISS should receive DWORD because it is defined as
 (V)COMISS xmm1, xmm2/m32

Differential Revision: http://reviews.llvm.org/D11712

llvm-svn: 245551
2015-08-20 11:21:36 +00:00
David Majnemer cfc1df553e [X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) fold
We didn't check for the necessary preconditions before folding a
mask/shift into a single mask.

This fixes PR24516.

llvm-svn: 245544
2015-08-20 09:00:56 +00:00
Hal Finkel 9fdce9adee [PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).

Fixes PR24225.

llvm-svn: 245535
2015-08-20 03:02:02 +00:00
Hal Finkel be78c25acb [PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.

llvm-svn: 245530
2015-08-20 01:18:20 +00:00
Sanjay Patel 9e5927fdc3 [x86] enable machine combiner reassociations for scalar double-precision min/max
llvm-svn: 245506
2015-08-19 21:27:27 +00:00
Sanjay Patel 4e3ee1e548 [x86] enable machine combiner reassociations for scalar single-precision maximums
llvm-svn: 245504
2015-08-19 21:18:46 +00:00
Juergen Ributzka b12248e9cd [AArch64][FastISel] Don't fold shifts with UB.
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.

This fixes rdar://problem/22345295.

llvm-svn: 245499
2015-08-19 20:52:55 +00:00
David Majnemer f25fe64716 [X86] Emit more efficient >= comparisons against 0
We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.

Given something like:
  void f(long long LL, bool *B) {
    *B = LL >= 0;
  }

We used to generate:
  shrq    $63, %rdi
  xorb    $1, %dil
  movb    %dil, (%rsi)

Now we generate:
  testq   %rdi, %rdi
  setns   (%rsi)

Differential Revision: http://reviews.llvm.org/D12136

llvm-svn: 245498
2015-08-19 20:51:40 +00:00
Dan Gohman dde8dce6a9 [WebAssembly] Use the default alignment for SIMD types.
Previously WebAssembly's datalayout string had -v128:8:128. This had been an
attempt to declare a certain level of support for unaligned SIMD accesses.
However, clang makes its own determinations for SIMD alignment that are
independent of the datalayout string, so this wasn't actually meaningful.

llvm-svn: 245494
2015-08-19 20:30:20 +00:00
Douglas Katzman 2362b69dd9 [Sparc]: asm-only support for the ldstub instruction.
llvm-svn: 245485
2015-08-19 19:30:57 +00:00
Nemanja Ivanovic 5f1cea4141 Temporary fix for the self-host failures introduced by rL244921.
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.

llvm-svn: 245481
2015-08-19 19:04:47 +00:00
Bruno Cardoso Lopes 27fd06922b [PeepholeOptimizer] Look through PHIs to find additional register sources
Reintroduce r245442. Remove an overly conservative assertion introduced
in r245442. We could replace the assertion to use `shareSameRegisterFile`
instead, but in that point in `insertPHI` we already lost the original
Def subreg to check against. So drop the assertion completely.

Original commit message:

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 245479
2015-08-19 18:53:36 +00:00
Douglas Katzman e5485c651e [SPARC] Enable writing to floating-point-state register.
llvm-svn: 245475
2015-08-19 18:34:48 +00:00
Ahmed Bougacha 9e00ec6195 [AArch64] Improve short-form diags on long-form Match_InvalidOperand.
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.

llvm-svn: 245469
2015-08-19 17:40:19 +00:00
Renato Golin eb552e83e0 Revert "[AArch64] Simplify/refactor code to ease code review. NFC."
This reverts commit r245443, as it broke AArch64 test-suite tramp3d
with an assert "Reg && "Null register has no regunits".

llvm-svn: 245455
2015-08-19 16:29:53 +00:00
Derek Schuff 55817ee604 x32. Fixes a bug in x32 exception handling.
This patch updates the X86 lowering so that the Exception Pointer and Selector
are 64-bit wide only if Subtarget.isTarget64BitLP64.

Patch by João Porto

Reviewers: dschuff, rnk
Differential Revision: http://reviews.llvm.org/D12111

llvm-svn: 245454
2015-08-19 16:28:21 +00:00
JF Bastien 5ab87edbb4 x32. Fixes jmp %reg in x32
x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits.

Author: jpp

Reviewers: jfb, dschuff, pavel.v.chupin

Subscribers: llvm-commits

Differential revision: http://reviews.llvm.org/D12112

llvm-svn: 245452
2015-08-19 16:17:08 +00:00
James Y Knight 3b0fd753c4 [Sparc] Rename LoadASR and StoreASR from r245360 to *ASI, as was intended.
llvm-svn: 245450
2015-08-19 15:59:49 +00:00
Bruno Cardoso Lopes 61009142b8 Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources"
Revert r245442 while investigating a fix. An assertion hit in
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380

llvm-svn: 245446
2015-08-19 15:10:32 +00:00
James Y Knight d966fb6fef [SPARC] Fix BooleanContents, so that select of a trunc doesn't
eliminate the trunc.

Differential Revision: http://reviews.llvm.org/D10442

llvm-svn: 245444
2015-08-19 14:47:04 +00:00
Chad Rosier 494abf1ad8 [AArch64] Simplify/refactor code to ease code review. NFC.
llvm-svn: 245443
2015-08-19 14:34:54 +00:00
Bruno Cardoso Lopes 0a1c126684 [PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r243486.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 245442
2015-08-19 14:34:41 +00:00
Silviu Baranga ad1b19fcb7 [ARM] Add instruction selection patterns for vmin/vmax
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.

Reviewers: rengolin, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12105

llvm-svn: 245439
2015-08-19 14:11:27 +00:00
Joerg Sonnenberger 7d180c59bb Map %fprs to %asr6 in the Sparc assembler parser.
llvm-svn: 245437
2015-08-19 13:55:14 +00:00
Tobias Grosser 85508e804b Revert "[X86] Widen the 'AND' mask if doing so shrinks the encoding size"
This reverts commit 245169 which miscompiles MultiSource/Applications/siod
from LNT.

llvm-svn: 245432
2015-08-19 11:35:10 +00:00
Michael Kuperstein 9fe42604aa [X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize
There are some cases where the mul sequence is smaller, but for the most part,
using a div is preferable. This does not apply to vectors, since x86 doesn't
have vector idiv, and a vector mul/shifts sequence ought to be smaller than a
scalarized division.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245431
2015-08-19 11:21:43 +00:00
Michael Kuperstein dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Alex Lorenz f3630113cd MIR Serialization: Serialize the operand's bit mask target flags.
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
2015-08-18 22:52:15 +00:00
Sanjay Patel 5c55fbc5ea use TLI.allowsMemoryAccess() to check if memory accesses are fast; NFCI
This consolidates use of isUnalignedMem32Slow() in one place.
There is a slight change in logic although I'm not sure that it would ever
come up in the real world: we were assuming that an alignment of the type 
size is always fast; now, we actually check the data layout to confirm that.

llvm-svn: 245382
2015-08-18 22:48:12 +00:00
Joerg Sonnenberger b0ce8747c3 Load/store instructions for floating points with address space require SparcV9.
To properly handle this, define the *a instructions as separate
instruction classes by refactoring the LoadA and StoreA multiclasses.
Move the instruction tests into the sparcv9 file to test the difference.

llvm-svn: 245360
2015-08-18 21:31:46 +00:00
David Majnemer 0ad363eebc [WinEH] Calculate state numbers for the new EH representation
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet.   Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.

Differential Revision: http://reviews.llvm.org/D12098

llvm-svn: 245331
2015-08-18 19:07:12 +00:00
Matthias Braun d55bcf2646 MachineRegisterInfo: Introduce isPhysRegUsed()
This method checks whether a physical regiser or any of its aliases are
used in the function.

Using this function in SIRegisterInfo::findUnusedReg() should also fix
this reported failure:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html
http://reviews.llvm.org/rL242173#inline-533

The report doesn't come with a testcase and I don't know enough about
AMDGPU to create one myself.

llvm-svn: 245329
2015-08-18 18:54:27 +00:00
Sanjay Patel 1cd6d88e4d use minSize wrapper; NFCI
These were missed when other uses were switched over:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243994

llvm-svn: 245311
2015-08-18 16:44:23 +00:00
Chad Rosier 3dd0e942b6 [AArch64] Simplify the logic for computing in bounds offset. NFC.
llvm-svn: 245307
2015-08-18 16:20:03 +00:00
Daniel Sanders 63f4a5dcad [mips] Expand JAL instructions when PIC is enabled.
Summary: This is the correct way to handle JAL instructions when PIC is enabled.

Patch by Toma Tabacu

Reviewers: seanbruno, tomatabacu

Subscribers: brooks, seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D6231

llvm-svn: 245305
2015-08-18 16:18:09 +00:00
Zoran Jovanovic 2fe8466f6e [mips][microMIPS] Implement DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D10953

llvm-svn: 245297
2015-08-18 14:40:43 +00:00
Zoran Jovanovic a6593ff613 [mips][microMIPS] Implement SW and SWE instructions
Differential Revision: http://reviews.llvm.org/D10869

llvm-svn: 245293
2015-08-18 12:53:08 +00:00
Daniel Sanders a699444094 [mips] Make the MipsAsmParser capable of knowing whether PIC mode is enabled or not.
Summary:
This information is needed to decide whether we do the PIC-only JAL expansions or not. It's also needed for an upcoming patch which implements the .cprestore assembler directive (which can only be used effectively in PIC mode).

By making this information available to the MipsAsmParser, we will know when to insert the instructions mandated by the .cprestore assembler directive and we will be able to give some useful warnings when we encounter a potential misuse of this directive.

Patch by Toma Tabacu

Reviewers: dsanders, seanbruno

Subscribers: brooks, seanbruno, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D5626

llvm-svn: 245291
2015-08-18 12:33:54 +00:00
Daniel Sanders f1ae367a99 [mips] Correct -Woverflow warning in r245208 without changing signedness of the constant.
This was supposed to have been committed as part of r245208

llvm-svn: 245285
2015-08-18 09:55:57 +00:00
Guozhi Wei f66d384443 Align SP adjustment in function getSPAdjust
This commit adds a new function TargetFrameLowering::alignSPAdjust
and calls it from TargetInstrInfo::getSPAdjust. It fixes PR24142.

llvm-svn: 245253
2015-08-17 22:36:27 +00:00
Douglas Katzman 685a7d1a70 [SPARC]: recognize '.' as the start of an assembler expression.
llvm-svn: 245232
2015-08-17 19:55:01 +00:00
James Molloy 974838f294 [ARM] Fix crash when targetting CPU without NEON
We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available.

Found here: https://code.google.com/p/chromium/issues/detail?id=521671

llvm-svn: 245231
2015-08-17 19:37:12 +00:00
Silviu Baranga b322aa6f53 [CostModel][AArch64] Increase cost of vector insert element and add missing cast costs
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.

Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.

Reviewers: rengolin

Subscribers: mcrosier, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11939

llvm-svn: 245226
2015-08-17 16:05:09 +00:00
Silviu Baranga d5ac26937c [CostModel][ARM] Increase cost of insert/extract operations
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.

Reviewers: rengolin

Subscribers: mssimpso, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12030

llvm-svn: 245225
2015-08-17 15:57:05 +00:00
Aaron Ballman aa3d810b5f Correcting a -Woverflow warning where 0xFFFF was overflowing an implicit constant conversion.
llvm-svn: 245220
2015-08-17 14:25:57 +00:00
Daniel Sanders a39ef1c68f [mips] [IAS] Add support for the DLA pseudo-instruction and fix problems with DLI
Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures.

Reviewers: tomatabacu, seanbruno, vkalintiris

Subscribers: brooks, seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D9524

llvm-svn: 245208
2015-08-17 10:11:55 +00:00
James Molloy 88edc8243d Remove hand-rolled matching for fmin and fmax.
SDAGBuilder now does this all for us.

llvm-svn: 245198
2015-08-17 07:13:20 +00:00
James Molloy c617be559a Rip out hand-rolled matching code for VMIN, VMAX, VMINNM and VMAXNM
This is no longer needed - SDAGBuilder will do this for us.

llvm-svn: 245197
2015-08-17 07:13:15 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Yaron Keren 178c465223 Add missing include guard.
llvm-svn: 245173
2015-08-16 07:55:08 +00:00
David Majnemer 1a59e49f3c [X86] Widen the 'AND' mask if doing so shrinks the encoding size
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero.  This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.

This fixes PR24085.

Differential Revision: http://reviews.llvm.org/D11289

llvm-svn: 245169
2015-08-16 04:52:11 +00:00
Sanjay Patel 40d4eb40f6 [x86] enable machine combiner reassociations for scalar single-precision minimums
llvm-svn: 245166
2015-08-15 17:01:54 +00:00
Yaron Keren 8b2a031cff Silence VS2015 warning.
Patch by James Touton!

http://reviews.llvm.org/D11890

llvm-svn: 245161
2015-08-15 14:54:43 +00:00
Simon Pilgrim 0750c84623 [DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.

(zext (truncate x)) -> (zext (and(x, m))

Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.

This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.

Differential Revision: http://reviews.llvm.org/D11764

llvm-svn: 245160
2015-08-15 13:27:30 +00:00
Matt Arsenault 588732bd6e AMDGPU/SI: Only look at live out SGPR defs
When trying to fix SGPR live ranges, skip defs that are
killed in the same block as the def. I don't think
we need to worry about these cases as long as the
live ranges of the SGPRs in dominating blocks are
correct.

This reduces the number of elements the second
loop over the function needs to look at, and makes
it generally easier to understand. The second loop
also only considers if the live range is live
in to a block, which logically means it
must have been live out from another.

llvm-svn: 245150
2015-08-15 02:58:49 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
JF Bastien d4698e1bac [WebAssembly] Add Relooper
This is just an initial checkin of an implementation of the Relooper algorithm, in preparation for WebAssembly codegen to utilize. It doesn't do anything yet by itself.

The Relooper algorithm takes an arbitrary control flow graph and generates structured control flow from that, utilizing a helper variable when necessary to handle irreducibility. The WebAssembly backend will be able to use this in order to generate an AST for its binary format.

Author: azakai

Reviewers: jfb, sunfish

Subscribers: jevinskie, arsenm, jroelofs, llvm-commits

Differential revision: http://reviews.llvm.org/D11691

llvm-svn: 245142
2015-08-15 01:23:28 +00:00
Matt Arsenault 297ae311ce AMDGPU/SI: Fix printing useless info with amdhsa
The comments at the bottom would all report 0 if
amdhsa was used.

llvm-svn: 245135
2015-08-15 00:12:39 +00:00
Matt Arsenault 0259a7aa41 AMDGPU/SI: Update LiveVariables
This is simple but won't work if/when this pass
is moved to be post-SSA.

llvm-svn: 245134
2015-08-15 00:12:37 +00:00
Matt Arsenault 670ba46efe AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges
Does not mark SlotIndexes as reserved, although I think
that might be OK.

LiveVariables still need to be handled.

llvm-svn: 245133
2015-08-15 00:12:35 +00:00
Matt Arsenault b75233235c AMDGPU: Remove unnecessary assert
These shouldn't ever be null. The number of successors
was already asserted to be 2.

llvm-svn: 245132
2015-08-15 00:12:32 +00:00
Matt Arsenault 4275c29a02 AMDGPU/SI: Make comments more precise.
True branch instructions do behave as expected with liveness.

Avoid the phrasing "branch decision is based on a value in an SGPR"
because this could be misleading. A VALU compare instruction's
result is still based on an SGPR, even though that condition
may be divergent.

llvm-svn: 245131
2015-08-15 00:12:30 +00:00
Pat Gavlin b399095c3f Add a target environment for CoreCLR.
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).

Differential Revision: http://reviews.llvm.org/D11012

llvm-svn: 245115
2015-08-14 22:41:43 +00:00
Ahmed Bougacha cd35787217 [AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns.
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.

llvm-svn: 245107
2015-08-14 22:06:05 +00:00
Tom Stellard bef1094ee7 AMDGPU/SI: Add missing spill class
The compiler was failing to spill for some shaders.

Patch By: Axel Davy

llvm-svn: 245087
2015-08-14 19:46:05 +00:00
Renato Golin 980b6cc42b Revert "[ARM] Fix MachO CPU Subtype selection"
This reverts commit r245081, as it breaks many builds.

llvm-svn: 245086
2015-08-14 19:35:47 +00:00
Vedant Kumar 2f079be789 [ARM] Fix MachO CPU Subtype selection
This patch makes the Darwin ARM backend take advantage of TargetParser.  It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.

Differential Revision: http://reviews.llvm.org/D11996

llvm-svn: 245081
2015-08-14 18:36:47 +00:00
Sanjay Patel ed502905f7 [x86] fix allowsMisalignedMemoryAccess() implementation
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.

Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.

Differential Revision: http://reviews.llvm.org/D10662

llvm-svn: 245075
2015-08-14 17:53:40 +00:00
Rafael Espindola dbaf0498a9 Revert "Centralize the information about which object format we are using."
This reverts commit r245047.

It was failing on the darwin bots. The problem was that when running

./bin/llc -march=msp430

llc gets to

  if (TheTriple.getTriple().empty())
    TheTriple.setTriple(sys::getDefaultTargetTriple());

Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.

That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.

llvm-svn: 245062
2015-08-14 15:48:41 +00:00
Sanjay Patel 2e75341b7f don't repeaat function names in comments; NFC
llvm-svn: 245058
2015-08-14 15:11:42 +00:00
Rafael Espindola 90eb70c8a7 Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.

It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.

llvm-svn: 245047
2015-08-14 13:31:17 +00:00
James Molloy 63be198712 [AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.

I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.

llvm-svn: 245035
2015-08-14 09:08:50 +00:00
David Majnemer b611e3f50e [IR] Add token types
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.

There are several applications for such a type but my immediate
motivation stems from WinEH.  Our personality routine enforces a
single-entry - single-exit regime for cleanups.  After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together.  We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.

Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup.  This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.

What is the burden to the optimizer?  Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway.  There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.

Differential Revision: http://reviews.llvm.org/D11861

llvm-svn: 245029
2015-08-14 05:09:07 +00:00
Saleem Abdulrasool 3e190cb098 PowerPC: remove dead initialization (NFC)
Identified by the clang static analyzer.  No functional change intended.

llvm-svn: 245022
2015-08-14 03:48:35 +00:00
Simon Pilgrim 7218251861 [AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation
D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

Differential Revision: http://reviews.llvm.org/D12007

llvm-svn: 244960
2015-08-13 21:40:02 +00:00
Ahmed Bougacha 80e4ac802a [AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.

Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.

llvm-svn: 244955
2015-08-13 21:09:13 +00:00
Simon Pilgrim 4a8d6b3b9e [X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the X86 implementation
Follow up to D10947 - D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

This patch removes the X86 implementation and improves the AVX1/AVX2 support to correctly lower 256-bit integer vectors.

Differential Revision: http://reviews.llvm.org/D12006

llvm-svn: 244949
2015-08-13 20:45:55 +00:00
Yaron Keren 556b21aa10 Remove and forbid raw_svector_ostream::flush() calls.
After r244870 flush() will only compare two null pointers and return,
doing nothing but wasting run time. The call is not required any more
as the stream and its SmallString are always in sync.

Thanks to David Blaikie for reviewing.

llvm-svn: 244928
2015-08-13 18:12:56 +00:00
Nemanja Ivanovic 1c39ca6501 Scalar to vector conversions using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.

llvm-svn: 244921
2015-08-13 17:40:44 +00:00
James Molloy 31117875c2 [ARM] FMINNAN/FMAXNAN of f64 are not legal.
This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64.

This will get test coverage imminently when D12015 lands.

llvm-svn: 244916
2015-08-13 17:28:26 +00:00
James Molloy c71f78f49f [ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP.
This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred.

This should be NFC.

llvm-svn: 244915
2015-08-13 17:28:20 +00:00
Ulrich Weigand a887f06214 [SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
  { <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.

This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.

Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.

However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.

This patch fixes the crash by implementing CanLowerReturn.  As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.

llvm-svn: 244889
2015-08-13 13:37:06 +00:00
John Brawn 68acdcb435 [ARM] Reorganise and simplify thumb-1 load/store selection
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.

Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.

Differential Revision: http://reviews.llvm.org/D11800

llvm-svn: 244882
2015-08-13 10:48:22 +00:00
Ahmed Bougacha 2a97b1bcf8 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.

llvm-svn: 244860
2015-08-13 01:13:56 +00:00
JF Bastien 71d29acecd WebAssembly: floating-point comparisons
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
 * Tell ISelLowering that all booleans are either 0 or 1.
 * Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
 * Add tests for ord/uno.
 * Add tests for ueq/one/ult/ule/ugt/uge.
 * Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11970

llvm-svn: 244779
2015-08-12 17:53:29 +00:00
Sanjay Patel 2366168bad 80-cols; NFC
llvm-svn: 244755
2015-08-12 15:12:25 +00:00
Sanjay Patel dc87d1440c fix typo; NFC
llvm-svn: 244753
2015-08-12 15:09:09 +00:00
Zoran Jovanovic 366783e14c [mips][microMIPS] Create microMIPS64r6 subtarget and implement DALIGN, DAUI, DAHI, DATI, DEXT, DEXTM and DEXTU instructions
Differential Revision: http://reviews.llvm.org/D10923

llvm-svn: 244744
2015-08-12 12:45:16 +00:00
Michael Kuperstein fe0d9bb6eb [X86] Disable mul -> shl + lea combine when compiling for minsize
Differential Revision: http://reviews.llvm.org/D11904

llvm-svn: 244740
2015-08-12 11:27:26 +00:00
Michael Kuperstein bc7f99a3ab [X86] Allow x86 call frame optimization to fold more loads into pushes
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.

Differential Revision: http://reviews.llvm.org/D11945

llvm-svn: 244729
2015-08-12 10:14:58 +00:00
Matt Arsenault c574686529 AMDGPU: Fix assert on dbg_value instructions
llvm-svn: 244728
2015-08-12 09:04:44 +00:00
Simon Pilgrim 8c049d5c03 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

llvm-svn: 244723
2015-08-12 08:08:56 +00:00
Saleem Abdulrasool 9e5f2a96f1 X86: hoist a condition into a variable (NFC)
The same value is used multiple times through the function.  Hoist the condition
into a variable.  This should fix a silly static analysis warning where the
conditions flip around.  No functional change intended.

llvm-svn: 244713
2015-08-12 02:01:36 +00:00
Sanjay Patel 260b6d36f4 [x86] enable machine combiner reassociations for 256-bit vector FP mul/add
llvm-svn: 244705
2015-08-12 00:29:10 +00:00
Alex Lorenz 5659a2f961 PseudoSourceValue: Transform the mips subclass to target independent subclasses
This commit transforms the mips-specific 'MipsCallEntry' subclass of the
'PseudoSourceValue' class into two, target-independent subclasses named
'GlobalValuePseudoSourceValue' and 'ExternalSymbolPseudoSourceValue'.

This change makes it easier to serialize the pseudo source values by removing
target-specific pseudo source values.

Reviewers: Akira Hatanaka
llvm-svn: 244698
2015-08-11 23:23:17 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Alex Lorenz c49e4fe9cc PseudoSourceValue: Introduce a 'PSVKind' enumerator.
This commit introduces a new enumerator named 'PSVKind' in the
'PseudoSourceValue' class. This enumerator is now used to distinguish between
the various kinds of pseudo source values.

This change is done in preparation for the changes to the pseudo source value
object management and to the PseudoSourceValue's class hierarchy - the next two
PseudoSourceValue commits will get rid of the global variable that manages the
pseudo source values and the mips specific MipsCallEntry subclass.

Reviewers: Akira Hatanaka
llvm-svn: 244687
2015-08-11 22:32:00 +00:00
Mark Heffernan 438ffe5eac Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.

llvm-svn: 244684
2015-08-11 22:16:34 +00:00
JF Bastien da06bce8b5 WebAssembly: implement comparison.
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11924

llvm-svn: 244665
2015-08-11 21:02:46 +00:00
Sanjay Patel 2c6a01570d [x86] enable machine combiner reassociations for 128-bit vector single/double multiplies
llvm-svn: 244657
2015-08-11 20:19:23 +00:00
JF Bastien 480c840896 WebAssembly: implement WebAssemblyTargetLowering::getTargetNodeName
Summary: Implementation is the same as in AArch64.

Subscribers: aemerson, jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11956

llvm-svn: 244655
2015-08-11 20:13:18 +00:00
Rafael Espindola 3adc7ce9f1 Use llvm::make_unique to fix the MSVC build.
llvm-svn: 244641
2015-08-11 18:11:17 +00:00
Michael Kuperstein 243c073a2e [X86] Allow merging of immediates within a basic block for code size savings
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.

Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363

llvm-svn: 244601
2015-08-11 14:10:58 +00:00
James Molloy b7b2a1e9b4 [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic.
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:

  - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
  - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!

llvm-svn: 244595
2015-08-11 12:06:37 +00:00
James Molloy edf38f0cb0 [AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN
NFCI. This just removes custom ISDNodes that are no longer needed.

llvm-svn: 244594
2015-08-11 12:06:33 +00:00
James Molloy d616c642bb [ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.

NFCI.

llvm-svn: 244593
2015-08-11 12:06:28 +00:00
James Molloy ee868b2a3e [ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.

NFCI.

llvm-svn: 244592
2015-08-11 12:06:25 +00:00
James Molloy ea3a687a33 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy db8ee4b5a9 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00
Marina Yatsina 8c997af103 [X86] Add SAL mnemonics for Intel syntax
SAL and SHL instructions perform the same operation

Differential Revision: http://reviews.llvm.org/D11882

llvm-svn: 244588
2015-08-11 12:05:06 +00:00
Marina Yatsina d353c45eaf [X86] Fix REPE, REPZ, REPNZ for intel syntax
REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well.
Currently using these instructions causes compilation errors for Intel syntax.

Differential Revision: http://reviews.llvm.org/D11794

llvm-svn: 244584
2015-08-11 11:28:10 +00:00
Marina Yatsina f6bc15d763 [X86] Fix imul alias for intel syntax
The "imul reg, imm" alias is not defined for intel syntax. 
In intel syntax there is no w/l/q suffix for the imul instruction.

Differential Revision: http://reviews.llvm.org/D11887

llvm-svn: 244582
2015-08-11 10:43:04 +00:00
Vasileios Kalintiris 1c78ca6a09 [mips] Remap move as or.
Summary:
This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or
'addu'. The use of addu/daddu instead of or as move was highlighted as a
performance issue during the analysis of a recent 64bit design. Originally
move was encoded as 'or' by binutils but was changed for the r10k cpu family
due to their pipeline which had 2 arithmetic units and a single logical unit,
and so could issue multiple (d)addu based moves at the same time but only 1
logical move.

This patch preserves the disassembly behaviour so that disassembling a old style
(d)addu move still appears as move, but assembling move always gives an or

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11796

llvm-svn: 244579
2015-08-11 08:56:25 +00:00
Michael Kuperstein 7337ee23d8 [X86] When optimizing for minsize, use POP for small post-call stack clean-up
When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp"
following a call by one or two pops, respectively. We don't try to do it in
general, but only when the stack adjustment immediately follows a call - which
is the most common case.

That allows taking a short-cut when trying to find a free register to pop into,
instead of a full-blown liveness check. If the adjustment immediately follows a
call, then every register the call clobbers but doesn't define should be dead at
that point, and can be used.

Differential Revision: http://reviews.llvm.org/D11749

llvm-svn: 244578
2015-08-11 08:48:48 +00:00
JF Bastien 11bf0da0d7 WebAssembly: NFC fix release build break, unused variable.
Summary: Caused by D11914, pointed out by blaikie.

Subscribers: llvm-commits, jfb, dblaikie

Differential Revision: http://reviews.llvm.org/D11929

llvm-svn: 244570
2015-08-11 04:52:24 +00:00
JF Bastien ef172fc9f0 WebAssembly: add basic floating-point tests
Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11927

llvm-svn: 244562
2015-08-11 02:45:15 +00:00
Cameron Esfahani f97999dc46 Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler.
Summary: Explicitly clear the MI operand list when getInstruction() is called.

Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11665

llvm-svn: 244557
2015-08-11 01:15:07 +00:00
JF Bastien e73ce68225 WebAssembly: simply assert on SNaN and NaNs with payloads
Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11925

llvm-svn: 244551
2015-08-11 00:49:20 +00:00
Joerg Sonnenberger ebe7bf44ec Add lduw and lwua aliases for SPARCv9.
llvm-svn: 244535
2015-08-10 23:47:22 +00:00
Joerg Sonnenberger 2ee3d76737 Load/store for float registers from/to alternate space.
llvm-svn: 244532
2015-08-10 23:33:17 +00:00
JF Bastien 4a6422562d WebAssembly: print immediates
Summary:
For now output using C99's hexadecimal floating-point representation.

This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11914

llvm-svn: 244520
2015-08-10 22:36:48 +00:00
Joerg Sonnenberger 6dce129051 Add support for the signx instrution alias of SPARCv9.
llvm-svn: 244519
2015-08-10 22:32:25 +00:00
JF Bastien fa9746dc8d x86: Emit LAHF/SAHF instead of PUSHF/POPF
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.

As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.

I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

| Time per call (ms)  | Runtime (ms) | Benchmark                      |
| 0.000012514         |      6257    | sete.i386                      |
| 0.000012810         |      6405    | sete.i386-fast                 |
| 0.000010456         |      5228    | sete.x86-64                    |
| 0.000010496         |      5248    | sete.x86-64-fast               |
| 0.000012906         |      6453    | lahf-sahf.i386                 |
| 0.000013236         |      6618    | lahf-sahf.i386-fast            |
| 0.000010580         |      5290    | lahf-sahf.x86-64               |
| 0.000010304         |      5152    | lahf-sahf.x86-64-fast          |
| 0.000028056         |     14028    | pushf-popf.i386                |
| 0.000027160         |     13580    | pushf-popf.i386-fast           |
| 0.000023810         |     11905    | pushf-popf.x86-64              |
| 0.000026468         |     13234    | pushf-popf.x86-64-fast         |

Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

Reviewers: rnk, jvoung, t.p.northover

Subscribers: llvm-commits

Differential revision: http://reviews.llvm.org/D6629

llvm-svn: 244503
2015-08-10 20:59:36 +00:00
Sanjay Patel d09391c8cd fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244499
2015-08-10 20:45:44 +00:00
Simon Pilgrim a3a72b41de [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.

Differential Revision: http://reviews.llvm.org/D11886

llvm-svn: 244495
2015-08-10 20:21:15 +00:00
James Y Knight 3994be87de [Sparc] Implement i64 load/store support for 32-bit sparc.
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).

As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.

The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.

This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.

- Modifies the list of reserved registers, the stack spilling code,
  and register copying code to support the IntPair register class.

- Adds support in AsmParser. (note that in asm text, you write the
  name of the first register of the pair only. So the parser has to
  morph the single register into the equivalent paired register).

- Adds the new instructions themselves (LDD/STD/LDDA/STDA).

- Hooks up the instructions and registers as a vector type v2i32. Adds
  custom legalizer to transform i64 load/stores into v2i32 load/stores
  and bitcasts, so that the new instructions can actually be
  generated, and marks all operations other than load/store on v2i32
  as needing to be expanded.

- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
  This hack undoes the transformation of i64 operands into two
  arbitrarily-allocated separate i32 registers in
  SelectionDAGBuilder. and instead passes them in a single
  IntPair. (Arbitrarily allocated registers are not useful, asm code
  expects to be receiving a pair, which can be passed to ldd/std.)

Also adds a bunch of test cases covering all the bugs I've added along
the way.

Differential Revision: http://reviews.llvm.org/D8713

llvm-svn: 244484
2015-08-10 19:11:39 +00:00
Chad Rosier c56a9132d0 [AArch64] Convert a conditional check that will always be true to an assert. NFC.
llvm-svn: 244479
2015-08-10 18:42:45 +00:00
Chad Rosier caed6db51e Typo. Move comment closer to relevant code. NFC.
llvm-svn: 244465
2015-08-10 17:17:19 +00:00
Sanjay Patel 10294b59de fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244464
2015-08-10 17:15:17 +00:00
Sanjay Patel 0f12d71b49 fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244463
2015-08-10 17:00:44 +00:00
Sanjay Patel 68b0325a9e fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244460
2015-08-10 16:47:47 +00:00
Sanjay Patel 9a9003d94c fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244458
2015-08-10 16:43:20 +00:00
Marina Yatsina a0e02410e1 Test commit to verify commit access
llvm-svn: 244438
2015-08-10 11:33:10 +00:00
Saleem Abdulrasool 6bc5ed3e7a X86: remove a dead store (NFC)
The SP was always unconditionally assigned to later, but initialised early.
This delays the initialisation, and avoids the dead store.  Identified by
clang static analysis.  No functional change intended.

llvm-svn: 244423
2015-08-09 20:39:09 +00:00
Sanjay Patel e0178262d4 [x86] enable machine combiner reassociations for 128-bit vector single/double adds
llvm-svn: 244403
2015-08-08 19:08:20 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Craig Topper cb1f601a7b [X86] Add ADX and RDSEED to Skylake processor.
llvm-svn: 244396
2015-08-08 07:31:15 +00:00
Craig Topper 01dd4ea334 Add SlowBTMem to Sandy Bridge and newer Intel CPUs. Reading through Agner Fog's table suggests there have been no improvements to these processors relative to Westmere for bit test instructions.
llvm-svn: 244395
2015-08-08 07:20:04 +00:00
Tom Stellard 30cf77457d AMDGPU/SI: Another attempt to fix Windows bots broken by r244372
llvm-svn: 244383
2015-08-08 01:11:07 +00:00
Matt Arsenault b130076469 Remove unnecessary includes
llvm-svn: 244382
2015-08-08 00:41:53 +00:00
Matt Arsenault cbd753761a AMDGPU: Implement AMDGPUOperand::print()
llvm-svn: 244381
2015-08-08 00:41:51 +00:00
Matt Arsenault 4635915504 AMDGPU/SI: Remove VCCReg
llvm-svn: 244380
2015-08-08 00:41:48 +00:00
Matt Arsenault 6942d1a034 AMDGPU/SI: Remove source uses of VCCReg
llvm-svn: 244379
2015-08-08 00:41:45 +00:00
Tom Stellard fc70950bf2 AMDGPU/SI: Attempt to fix Windows bots broken by r244372
llvm-svn: 244376
2015-08-08 00:17:59 +00:00
Tom Stellard fd25395c72 AMDGPU: Add pass to lower OpenCL image and sampler arguments.
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.

Patch by: Zoltan Gilian

llvm-svn: 244372
2015-08-07 23:19:30 +00:00
Quentin Colombet 7d8c74ff3f [AArch64][LoadStoreOptimizer] Turn a test into an assert. NFC.
At this point the given Opc must be valid, otherwise we should
not look for a matching pair to form paired load or store.

Thanks to Chad to point out this piece of code!

llvm-svn: 244366
2015-08-07 22:40:51 +00:00
Tom Stellard 8ebad11ee9 AMDGPU/SI: Use InstAlias instead of MnemonicAlias for VOPC instructions
Summary:
With InstAlias, we don't need to print the _e32 portion of the mnemonic
when we print the $dst operand.  This change makes it possible to
include vcc in the asm string when we switch VOPC over to having
implicit vcc defs.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11813

llvm-svn: 244362
2015-08-07 22:00:56 +00:00
Matt Arsenault 711b390a7c AMDGPU: Assume SMRD access for constant address space
Since r243294 these are selected to SMRD and
moved later if required.

llvm-svn: 244354
2015-08-07 20:18:34 +00:00
Chad Rosier 9659de379d [ARM] Remove an unused reference to MachineRegisterInfo. NFC.
llvm-svn: 244334
2015-08-07 17:02:29 +00:00
Tom Stellard c8733e805e AMDGPU/SI: Use correct encoding of vopc for VI in the assembler
Summary: We were using the SI encoding for VI.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11812

llvm-svn: 244332
2015-08-07 16:45:33 +00:00
Tom Stellard 85656cabfb AMDGPU/SI: v_mac_legacy_f32 does not exist on VI
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11810

llvm-svn: 244322
2015-08-07 15:34:30 +00:00
Tom Stellard 11f19f78f0 AMDGPU/SI: Remove unused outs parameter from VOPC TableGen classes
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11809

llvm-svn: 244321
2015-08-07 15:34:27 +00:00
Silviu Baranga a07090f7fa Fix unused variable warning introduced in r244314
llvm-svn: 244315
2015-08-07 12:05:46 +00:00
Silviu Baranga 3e8e51c1a9 [ARM] Update ReconstructShuffle to handle mismatched types
Summary:
Port the ReconstructShuffle function from AArch64 to ARM
to handle mismatched incoming types in the BUILD_VECTOR
node.

This fixes an outstanding FIXME in the ReconstructShuffle
code.

Reviewers: t.p.northover, rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11720

llvm-svn: 244314
2015-08-07 11:40:46 +00:00
JF Bastien 315cc06840 WebAssembly: textual emission uses expected opcode names
Summary: WebAssembly's tablegen instructions have the names WebAssembly expects, but by LLVM convention they're uppercase and suffixed with their type after an underscore. Leave the C++ code that way, but print outt he names WebAssembly expects (lowercase, no type). We could teach tablegen to do this later, maybe by using `!cast<string>(node)` in the .td files.

Reviewers: sunfish

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D11776

llvm-svn: 244305
2015-08-07 01:57:03 +00:00
Juergen Ributzka f09c7a3d0f [AArch64][FastISel] Always use AND before checking the branch flag.
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.

This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.

This fixes rdar://problem/22161062.

llvm-svn: 244291
2015-08-06 22:44:15 +00:00
Juergen Ributzka 9f54dbe7a1 Revert "[AArch64][FastISel] Add more truncation tests." and "[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types."
This reverts commit r243198 and 243304.

Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.

llvm-svn: 244287
2015-08-06 22:13:48 +00:00
Pete Cooper ebcd748927 Convert a bunch of loops to foreach. NFC.
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst.  This commit changes a bunch
of eligible loops to use it.

llvm-svn: 244260
2015-08-06 20:22:46 +00:00
Tom Stellard d488605ed3 AMDGPU/SI: Add Fiji support
Patch by: Alex Deucher

llvm-svn: 244255
2015-08-06 19:43:02 +00:00
Tom Stellard 217361c33f AMDGPU/SI: Add support for 32-bit immediate SMRD offsets on CI
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11604

llvm-svn: 244254
2015-08-06 19:28:38 +00:00
Tom Stellard dee26a2876 AMDGPU/SI: Use ComplexPatterns for SMRD addressing modes
Summary: This allows us to consolidate several of the TableGen patterns.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11602

llvm-svn: 244253
2015-08-06 19:28:30 +00:00
Nico Rieck 78199518c4 Rename inst_range() to instructions() for consistency. NFC
llvm-svn: 244248
2015-08-06 19:10:45 +00:00
Chad Rosier 22eb71056d [AArch64] Use a static function and other minor cleanup for readability. NFC.
llvm-svn: 244233
2015-08-06 17:37:18 +00:00
Chad Rosier f77e909f0a [AArch64] Improve the readability of the ld/st optimization pass. NFC.
llvm-svn: 244222
2015-08-06 15:50:12 +00:00
Douglas Katzman 63d64da0ce [SPARC] Don't compare arch name as a string, use the enum instead.
Fixes PR22695

llvm-svn: 244221
2015-08-06 15:44:12 +00:00
Michael Liao 66233b7d79 Removing tailing whitespaces
llvm-svn: 244203
2015-08-06 09:06:20 +00:00
Michael Kuperstein 868dc65444 [X86] Improve EmitLoweredSelect for contiguous CMOV pseudo instructions.
This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo
instructions with the same (or exactly opposite) conditions get lowered using a single
new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points)
when contiguous CMOVs are being lowered.

Patch by: kevin.b.smith@intel.com
Differential Revision: http://reviews.llvm.org/D11428

llvm-svn: 244202
2015-08-06 08:45:34 +00:00
Alex Lorenz 49873a8382 MIR Serialization: Initial serialization of the machine operand target flags.
This commit implements the initial serialization of the machine operand target
flags. It extends the 'TargetInstrInfo' class to add two new methods that help
to provide text based serialization for the target flags.

This commit can serialize only the X86 target flags, and the target flags for
the other targets will be serialized in the follow-up commits.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244185
2015-08-06 00:44:07 +00:00
JF Bastien 0f8a99b62f x86: NFC remove needless InstrCompiler cast
Summary: The casts from String to PatFrag weren't needed if we instead provided an SDNode. This fix was suggested by @pete in D11382.

Subscribers: pete, llvm-commits

Differential Revision: http://reviews.llvm.org/D11788

llvm-svn: 244167
2015-08-05 23:15:37 +00:00
Bjarke Hammersholt Roune 5cbc7d2999 [NVPTX] Use LDG for pointer induction variables.
More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.

Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.

llvm-svn: 244166
2015-08-05 23:11:57 +00:00
David Blaikie 3affe6e264 -Wdeprecated: Remove some dead code that was relying on a questionable (rule-of-3-violating) copy ctor in MCInstPrinter
llvm-svn: 244133
2015-08-05 21:15:48 +00:00
Krzysztof Parzyszek eca6f04074 [Hexagon] Edit a comment. NFC
llvm-svn: 244130
2015-08-05 21:08:26 +00:00
JF Bastien 8662083770 x86 atomic: optimize a.store(reg op a.load(acquire), release)
Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.

Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11382

llvm-svn: 244128
2015-08-05 21:04:59 +00:00
JF Bastien 7c4218f49c Revert "Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk."
I mistakenly committed the patch for D6629, and was trying to commit another. Reverting until it gets proper signoff.

llvm-svn: 244121
2015-08-05 20:53:56 +00:00
JF Bastien ce5256f5c5 Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk.
llvm-svn: 244120
2015-08-05 20:49:46 +00:00
Krzysztof Parzyszek 73e66f323a [Hexagon] Implement TargetTransformInfo for Hexagon
Author: Brendon Cahoon <bcahoon@codeaurora.org>
llvm-svn: 244089
2015-08-05 18:35:37 +00:00
Chandler Carruth 93205eb966 [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

llvm-svn: 244080
2015-08-05 18:08:10 +00:00
Pete Cooper 3ae0ee5453 Move BB succ_iterator to be inside TerminatorInst. NFC.
To get the successors of a BB we currently do successors(BB) which
ultimately walks the successors of the BB's terminator.

This moves the iterator to TerminatorInst as thats what we're actually
using to do the iteration, and adds a member function to TerminatorInst
to allow us to iterate directly over successors given an instruction.

For example, we can now do

  for (auto *Succ : BI->successors())

instead of

  for (unsigned i = 0, e = BI->getNumSuccessors(); i != e; ++i)

Reviewed by Tobias Grosser.

llvm-svn: 244074
2015-08-05 17:43:01 +00:00
Chad Rosier 69e3eb3c79 [AArch64] Register AArch64DeadRegisterDefinition pass with LLVM pass manager.
llvm-svn: 244067
2015-08-05 17:35:34 +00:00
James Y Knight bce20afe0f [Sparc] Fix disassembly of popc instruction.
And add tests.

Patch by David Wiberg!

llvm-svn: 244064
2015-08-05 17:00:30 +00:00
Matt Arsenault 95f0606e62 AMDGPU/SI: Remove EXECReg
For the same reasons as the other physical registers.

llvm-svn: 244062
2015-08-05 16:42:57 +00:00
Matt Arsenault 4c0487bff6 AMDGPU: Remove SCCReg.
These should be handled as a physical register rather
than a virtual register class with one member.

llvm-svn: 244061
2015-08-05 16:42:54 +00:00
Chad Rosier 1c81432eb6 [AArch64] Register (existing) AArch64BranchRelaxation pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

llvm-svn: 244060
2015-08-05 16:12:10 +00:00
Chad Rosier 0c6c5fc303 [AArch64] Make the naming of the Address Type Promotion pass consistent.
llvm-svn: 244057
2015-08-05 15:32:23 +00:00
Chad Rosier 794b9b2fdd [AArch64] Register (existing) AArch64AdvSIMDScalar pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

IIRC, this pass is off by default, but it's still helpful when debugging.

llvm-svn: 244056
2015-08-05 15:18:58 +00:00
Chad Rosier 084b78632e Make this less error prone by using a #define. NFC.
llvm-svn: 244048
2015-08-05 14:48:44 +00:00
Chad Rosier 9378c16ac8 [AArch64] Register (existing) AArch64ExpandPseudo pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

llvm-svn: 244046
2015-08-05 14:22:53 +00:00
Chad Rosier 96530b3a43 [AArch64] Register (existing) AArch64LoadStoreOpt pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.

This is the AArch64 version of r243052.

llvm-svn: 244041
2015-08-05 13:44:51 +00:00
Chad Rosier 43f5c84cfc Update comment. NFC.
llvm-svn: 244038
2015-08-05 12:40:13 +00:00
Artyom Skrobov 6fbef2a780 ARMISelDAGToDAG.cpp had this self-contradictory code:
return StringSwitch<int>(Flags)
          .Case("g", 0x1)
          .Case("nzcvq", 0x2)
          .Case("nzcvqg", 0x3)
          .Default(-1);
...

  // The _g and _nzcvqg versions are only valid if the DSP extension is
  // available.
  if (!Subtarget->hasThumb2DSP() && (Mask & 0x2))
    return -1;

ARMARM confirms that the comment is right, and the code was wrong.

llvm-svn: 244029
2015-08-05 11:02:14 +00:00
Tanya Lattner 0d28f80bd1 Rename all references to old mailing lists to new lists.llvm.org address.
llvm-svn: 243999
2015-08-05 03:51:17 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Sanjay Patel 75ced2782b [x86] machine combiner reassociation: mark EFLAGS operand as 'dead'
In the commentary for D11660, I wasn't sure if it was alright to create new
integer machine instructions without also creating the implicit EFLAGS operand. 
From what I can see, the implicit operand is always created by the MachineInstrBuilder
based on the instruction type, so we don't have to do that explicitly. However, in
reviewing the debug output, I noticed that the operand was not marked as 'dead'. 
The machine combiner should do that to preserve future optimization opportunities 
that may be checking for that dead EFLAGS operand themselves.

Differential Revision: http://reviews.llvm.org/D11696

llvm-svn: 243990
2015-08-04 15:21:56 +00:00
Vasileios Kalintiris 2f12b2ede5 [mips][FastISel] Disable code generation for unsupported targets through FastISel.
Summary:
Previously, we would check whether the target is supported or not, only in
fastSelectInstruction(). This means that 64-bit targets could use FastISel too.
We fix this by checking every overridden method of the FastISel class and
by falling back to SelectionDAG if the target isn't supported. This change
should have been committed along with r243638, but somehow I missed it.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11755

llvm-svn: 243986
2015-08-04 14:35:50 +00:00
Vasileios Kalintiris 044e172228 Revert r229675 - [mips] Avoid redundant sign extension of the result of binary bitwise instructions.
It introduced two regressions on 64-bit big-endian targets running under N32
(MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4, and
MultiSource/Applications/kimwitu++/kc) The issue is that on 64-bit targets
comparisons such as BEQ compare the whole GPR64 but incorrectly tell the
instruction selector that they operate on GPR32's. This leads to the
elimination of i32->i64 extensions that are actually required by
comparisons to work correctly.

There's currently a patch under review that fixes this problem.

llvm-svn: 243984
2015-08-04 14:26:35 +00:00
Saleem Abdulrasool 0a2672bb43 ARM: support windows division routines
This adds the software division routines for the Windows RTABI.  These are not
expected to be used often though as most modern Windows ARM capable targets
support hardware division.  In the case that the target CPU doesnt support
hardware division, this will be the fallback.

llvm-svn: 243952
2015-08-04 03:57:56 +00:00
Saleem Abdulrasool 67697a7ea9 ARM: make Darwin libcall registration table driven (NFC)
Make the libcall updating table driven similar to the approach that the Linux
and Windows codepath does below.  NFC.

llvm-svn: 243951
2015-08-04 03:57:52 +00:00
Ahmed Bougacha 81fda188f9 [AArch64] Rename FP formats to be more consistent. NFC.
Some are named "FP", others "SD", others still "FP*SD".
Rename all this to just use "FP", which, except for conversions
(which don't use this format naming scheme), implies "SD" anyway.

llvm-svn: 243936
2015-08-04 01:38:08 +00:00
Ahmed Bougacha e0e12db8c8 [AArch64] Add isel support for f16 indexed LD/ST.
llvm-svn: 243935
2015-08-04 01:29:38 +00:00
Ahmed Bougacha e8ea9ac32b [AArch64][v8.1a] The "pan" sysreg isn't MSR-specific. NFCI.
It's already in SysRegMappings, no need to also have it in MSRMappings:
the latter is only used if we didn't find a match in the former.

llvm-svn: 243933
2015-08-04 00:55:11 +00:00
Ahmed Bougacha 0cbe2efcd6 [AArch64] Remove unnecessary "break". NFC.
llvm-svn: 243931
2015-08-04 00:49:08 +00:00
Ahmed Bougacha 239d635d3d [AArch64] Use SDValue bool operator. NFC.
llvm-svn: 243930
2015-08-04 00:48:02 +00:00
Ahmed Bougacha b0ae36f0d1 [AArch64] Vector FCOPYSIGN supports Custom-lowering: mark it as such.
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!

The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.

llvm-svn: 243926
2015-08-04 00:42:34 +00:00
Tim Northover 9c340ec6fd ARM: remove horrible printf left over from debugging
llvm-svn: 243907
2015-08-03 22:19:08 +00:00
Pete Cooper 7be8f8f018 Convert some AArch64 code to foreach loops. NFC.
Also converted a cast<> to dyn_cast while i was working on the same
line of code.

llvm-svn: 243894
2015-08-03 19:04:32 +00:00
Tim Northover 910dde7ab2 ARM: prefer allocating VFP regs at stride 4 on Darwin.
This is necessary for WatchOS support, where the compact unwind format assumes
this kind of layout. For now we only want this on Swift-like CPUs though, where
it's been the Xcode behaviour for ages. Also, since it can expand the prologue
we don't want it at -Oz.

llvm-svn: 243884
2015-08-03 17:20:10 +00:00
John Brawn f3324cf1a5 [ARM] Make GlobalMerge merge extern globals by default
Enabling merging of extern globals appears to be generally either beneficial or
harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57)
it gives improvements in the 1-5% range, but in the rest the overall effect is
zero.

Differential Revision: http://reviews.llvm.org/D10966

llvm-svn: 243874
2015-08-03 12:13:33 +00:00
James Molloy 6967e5e4a3 Be less conservative about forming IT blocks.
In http://reviews.llvm.org/rL215382, IT forming was made more conservative under
the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M.

But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for
v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an
IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block
as long as the flags register is dead afterwards.

This gives significant performance improvements in a variety of MPEG based workloads.

Differential revision: http://reviews.llvm.org/D11680

llvm-svn: 243869
2015-08-03 09:24:48 +00:00
JF Bastien fda53373f2 WebAssembly: implement getScalarShiftAmountTy so we can shift by amount, with type
Summary: This currently sets the shift amount RHS to the same type as the LHS, and assumes that the LHS is a simple type. This isn't currently the case e.g. with weird integers sizes, but will eventually be true and will assert if not. That's what you get for having an experimental backend: break it and you get to keep both pieces. Most backends either set the RHS to MVT::i32 or MVT::i64, but WebAssembly is a virtual ISA and tries to have regular-looking binary operations where both operands are the same type (even if a 64-bit RHS shifter is slightly silly, hey it's free!).

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11715

llvm-svn: 243860
2015-08-03 00:00:11 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00
Jingyue Wu ffa09be222 [NVPTX] allow register copy between float and int
Summary:
Fixes PR24303. With Bruno's WIP (D11197) on PeepholeOptimizer, across-class
register copying (e.g. i32 to f32) becomes possible. Enhance
NVPTXInstrInfo::copyPhysReg to handle these cases.

Reviewers: jholewinski

Subscribers: eliben, jholewinski, llvm-commits, bruno

Differential Revision: http://reviews.llvm.org/D11622

llvm-svn: 243839
2015-08-01 18:02:12 +00:00
David Blaikie 78633802c2 -Wdeprecated-clean: Fix cases of violating the rule of 5 in ways that are deprecated in C++11
Remove some unnecessary explicit special members in Hexagon that, once
removed, allow the other implicit special members to be used without
depending on deprecated features.

llvm-svn: 243825
2015-08-01 05:31:27 +00:00
JF Bastien 8f9aea08d4 WebAssembly: handle more than int32 argument/return
Summary: Also test 64-bit integers, except shifts for now which are broken because isel dislikes the 32-bit truncate that precedes them.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11699

llvm-svn: 243822
2015-08-01 04:48:44 +00:00
David Blaikie a5fd382eb3 -Wdeprecated-clean: Fix cases of violating the rule of 5 in ways that are deprecated in C++11
Various targets use std::swap on specific MCAsmOperands (ARM and
possibly Hexagon as well). It might be helpful to mark those subclasses
as final, to ensure that the availability of move/copy operations can't
lead to slicing. (same sort of requirements as the non-vitual dtor -
protected or a final class)

llvm-svn: 243820
2015-08-01 04:40:41 +00:00
Alex Lorenz b4d0d6a345 AMDGPU/SI: Add implicit register operands in the correct order.
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.

I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.

This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.

Reviewers: Matt Arsenault

Differential Revision: http://reviews.llvm.org/D11689

llvm-svn: 243799
2015-07-31 23:30:09 +00:00
Jingyue Wu cf70053b20 [NVPTX] convert pointers in byval kernel arguments to global
Summary:
For example, in

  struct S {
    int *x;
    int *y;
  };
  __global__ void foo(S s) {
    int *b = s.y;
    // use b
  }

"b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for
accessing "b".

Reviewers: jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11505

llvm-svn: 243790
2015-07-31 21:44:14 +00:00
JF Bastien 4a2d56044f WebAssembly: handle `ret void`.
Summary:
Use -1 as numoperands for the return SDTypeProfile, denoting that return is variadic. Note that the patterns in InstrControl.td still need to match the inputs, so this ins't an "anything goes" variadic on ret!

The next step will be to handle other local types (not just int32).

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11692

llvm-svn: 243783
2015-07-31 21:04:18 +00:00
JF Bastien e71e653a5f x86: check hasOpaqueSPAdjustment in canRealignStack
Summary:
@rnk pointed out in [1] that x86's canRealignStack logic should match that in CantUseSP from hasBasePointer.

  [1]: http://reviews.llvm.org/D11160?id=29713#inline-89350

Reviewers: rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D11377

llvm-svn: 243772
2015-07-31 18:28:09 +00:00
JF Bastien d7fcc6f9c7 WebAssembly: handle unused function arguments.
Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11684

llvm-svn: 243770
2015-07-31 18:13:27 +00:00
JF Bastien 600aee9805 WebAssembly: print basic integer assembly.
Summary:
This prints assembly for int32 integer operations defined in WebAssemblyInstrInteger.td only, with major caveats:

  - The operation names are currently incorrect.
  - Other integer and floating-point types will be added later.
  - The printer isn't factored out to handle recursive AST code yet, since it can't even handle control flow anyways.
  - The assembly format isn't full s-expressions yet either, this will be added later.
  - This currently disables PrologEpilogCodeInserter as well as MachineCopyPropagation becasue they don't like virtual registers, which WebAssembly likes quite a bit. This will be fixed by factoring out NVPTX's change (currently a fork of PrologEpilogCodeInserter).

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11671

llvm-svn: 243763
2015-07-31 17:53:38 +00:00
Sanjay Patel 9ff4626028 [x86] reassociate integer multiplies using machine combiner pass
Add i16, i32, i64 imul machine instructions to the list of reassociation
candidates.

A new bit of logic is needed to handle integer instructions: they have an
implicit EFLAGS operand, so we have to make sure it's dead in order to do
any reassociation with integer ops.

Differential Revision: http://reviews.llvm.org/D11660

llvm-svn: 243756
2015-07-31 16:21:55 +00:00
Geoff Berry 8a7ef3b2ee [AArch64] Favor extended reg patterns for sub
Summary:
Favor the extended reg patterns over the shifted reg patterns that match
only the operand shift and not the full sign/zero extend and shift.

Reviewers: jmolloy, t.p.northover

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11569

llvm-svn: 243753
2015-07-31 15:55:54 +00:00
Jingyue Wu 4be014aebe Refactor: Simplify boolean conditional return statements in lib/Target/NVPTX
Summary: Use clang-tidy to simplify boolean conditional return statements

Reviewers: rafael, echristo, chandlerc, bkramer, craig.topper, dexonsmith, chapuni, eliben, jingyue, jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D9983

llvm-svn: 243734
2015-07-31 05:09:47 +00:00
Matt Arsenault e1ce344b5a AMDGPU: Fix v16i32 to v16i8 truncstore
llvm-svn: 243731
2015-07-31 04:12:04 +00:00
Matt Arsenault ba01337942 AMDGPU/SI: Set DwarfRegNum
This requires a fix in tablegen for the cast<int> from bits<16>
to work in the list initializer.

llvm-svn: 243723
2015-07-31 01:12:10 +00:00
Tom Stellard 82325598c3 AMDGPU/SI: Remove unused pattern for f32 constant loads
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11603

llvm-svn: 243719
2015-07-31 01:02:32 +00:00
Sumanth Gundapaneni 532a13691c [ARM] Lower modulo operation to generate __aeabi_divmod on Android
For a modulo (reminder) operation,
clang -target armv7-none-linux-gnueabi generates "__modsi3"
clang -target armv7-none-eabi generates "__aeabi_idivmod"
clang -target armv7-linux-androideabi generates "__modsi3"
Android bionic libc doesn't provide a __modsi3, instead it provides a
"__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate
the correct call when ever there is a modulo operation.

Differential Revision: http://reviews.llvm.org/D11661

llvm-svn: 243717
2015-07-31 00:45:12 +00:00
Sanjay Patel 1166f2ff9f fix memcpy/memset/memmove lowering when optimizing for size
Fixing MinSize attribute handling was discussed in D11363. 
This is a prerequisite patch to doing that.

The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).

The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize 
attribute must also exist. 

Fixing this more generally will be a follow-on patch or two.

Differential Revision: http://reviews.llvm.org/D11568

llvm-svn: 243693
2015-07-30 21:41:50 +00:00
Matt Arsenault 7a0c3a92c0 AMDGPU: Set SubRegIndex size and offset
I'm not sure what reasons the comment here could have
had for not setting these. Without these set, there is
an assertion hit during DWARF emission.

llvm-svn: 243661
2015-07-30 17:03:11 +00:00
Matt Arsenault b39e858356 AMDGPU: Fix unreachable when emitting binary debug info
Copy implementation of applyFixup from AArch64 with AArch64 bits
ripped out.

Tests will be included with a later commit. Several other
problems must be fixed before binary debug info emission
will work.

llvm-svn: 243660
2015-07-30 17:03:08 +00:00
Tom Stellard 4229aa942d AMDGPU/SI: Simplify moveSMRDToVALU()
Summary:
Replace the switch on instruction opcode with a switch on register size.
This way we don't need to update the switch statement when we add new
SMRD variants.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11601

llvm-svn: 243652
2015-07-30 16:20:42 +00:00
Tom Stellard 9d74076065 AMDGPU/SI: Remove isTriviallyReMaterializable() function from SIInstrInfo
Summary:
This function is never called.  isReallyTriviallyReMaterializable() is
the function that should be implemented instead.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11620

llvm-svn: 243651
2015-07-30 16:20:40 +00:00
Vasileios Kalintiris 2041b1dd0b [mips][FastISel] Remove hidden mips-fast-isel option.
Summary:
This hidden option would disable code generation through FastISel by
default. It was removed from the available options and from the
Fast-ISel tests that required it in order to run the tests.

Reviewers: dsanders

Subscribers: qcolombet, llvm-commits

Differential Revision: http://reviews.llvm.org/D11610

llvm-svn: 243638
2015-07-30 12:39:33 +00:00
Vasileios Kalintiris 77fb0a3dcf [mips][FastISel] Apply only zero-extension to constants prior to their materialization.
Summary:
Previously, we would sign-extend non-boolean negative constants and
zero-extend otherwise. This was problematic for PHI instructions with
negative values that had a type with bitwidth less than that of the
register used for materialization.

More specifically, ComputePHILiveOutRegInfo() assumes the constants
present in a PHI node are zero extended in their container and
afterwards deduces the known bits.

For example, previously we would materialize an i16 -4 with the
following instruction:

  addiu $r, $zero, -4

The register would end-up with the 32-bit 2's complement representation
of -4. However, ComputePHILiveOutRegInfo() would generate a constant
with the upper 16-bits set to zero. The SelectionDAG builder would use
that information to generate an AssertZero node that would remove any
subsequent trunc & zero_extend nodes.

In theory, we should modify ComputePHILiveOutRegInfo() to consult
target-specific hooks about the way they prefer to materialize the
given constants. However, git-blame reports that this specific code
has not been touched since 2011 and it seems to be working well for every
target so far.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11592

llvm-svn: 243636
2015-07-30 11:51:44 +00:00
Michael Kuperstein cdb076b8d4 [X86] Recognize "flags" as an identifier, not a register in Intel-syntax inline asm
Patch by: marina.yatsina@intel.com
Differential Revision: http://reviews.llvm.org/D11512

llvm-svn: 243630
2015-07-30 10:10:25 +00:00
Sanjay Patel 5bfbb36a09 push fast-math check for machine-combiner reassociations into instruction-type check; NFC
This makes it simpler to add instruction types that don't depend on fast-math.

llvm-svn: 243596
2015-07-30 00:04:21 +00:00
Nick Lewycky c3890d2969 Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.)
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.

llvm-svn: 243585
2015-07-29 22:32:47 +00:00
Eric Christopher d566fb12a1 Rename hasCompatibleFunctionAttributes->areInlineCompatible based
on suggestions. Currently the function is only used for inline purposes
and this is more descriptive for the use.

llvm-svn: 243578
2015-07-29 22:09:48 +00:00
Simon Pilgrim ba10f76705 [X86][SSE] Keep 32-bit target i64 vector shifts on SSE unit.
This patch improves the 32-bit target i64 constant matching to detect the shuffle vector splats that are introduced by i64 vector shift vectorization (D8416).

Differential Revision: http://reviews.llvm.org/D11327

llvm-svn: 243577
2015-07-29 21:44:27 +00:00
Tim Northover 2a9d801fd5 AArch64: use 32-bit MOV rather than UBFX to truncate registers.
It's potentially more efficient on Cyclone, and from the optimization guides &
schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd
expect a MOV to be about the most efficient instruction with its semantics,
even though the official "UXTW" alias is really a UBFX.

llvm-svn: 243576
2015-07-29 21:34:32 +00:00
Simon Pilgrim 86478c6909 [X86][SSE] Vectorize i64 ASHR operations
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.

Differential Revision: http://reviews.llvm.org/D11439

llvm-svn: 243569
2015-07-29 20:31:45 +00:00
Jingyue Wu 3a04dc6e78 Roll forward r242871
r242871 missed one place that should be guarded with isPhysicalReg. This patch
fixes that.

llvm-svn: 243555
2015-07-29 18:59:09 +00:00
Bruno Cardoso Lopes 38c0250679 Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources"
Reported to Broke some internal tests: PR24303

This reverts commit r243486.

llvm-svn: 243540
2015-07-29 17:46:47 +00:00
Tim Northover cf739b8c3d AArch64: use AddressingModes.h accessors for compare shifts
No functional change because "lsl #12" is actually encoded as 12, but one less
bug if someone ever decides to change that for the giggles.

llvm-svn: 243536
2015-07-29 16:39:56 +00:00
Jingyue Wu 7ec38530a5 Temporarily revert r242871
PR24299

llvm-svn: 243522
2015-07-29 15:26:11 +00:00
Bill Schmidt 42ddd71120 [PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register.  The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary.  This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.

Patch and test case by Tyler Kenney, cleaned up a bit by me.

This is a simple bug fix that would be good to incorporate into 3.7.

llvm-svn: 243519
2015-07-29 14:31:57 +00:00
Akira Hatanaka f53b0403f8 [AArch64] Define subtarget feature strict-align.
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -aarch64-strict-align to decide whether strict alignment should be
forced.

rdar://problem/21529937

llvm-svn: 243516
2015-07-29 14:17:26 +00:00
Alex Lorenz d8a1e542ab Fix broken ArrayRef conversion from r243497.
llvm-svn: 243501
2015-07-28 23:34:27 +00:00
Sanjay Patel 1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Alex Lorenz ef5c196fb0 MIR Serialization: Serialize the target index machine operands.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243497
2015-07-28 23:02:45 +00:00
Akira Hatanaka 2670f4a550 [ARM] Define subtarget feature strict-align.
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -arm-strict-align to decide whether strict alignment should be
forced. Also, remove the logic that was checking the OS and architecture
as clang is now responsible for setting strict-align based on the command
line options specified and the target architecute and OS.

rdar://problem/21529937

http://reviews.llvm.org/D11470

llvm-svn: 243493
2015-07-28 22:44:28 +00:00
Tim Northover 17ae83a25f AArch64: be careful of large immediates when optimising cmps.
llvm-svn: 243492
2015-07-28 22:42:32 +00:00
Bruno Cardoso Lopes 3c235763e5 [PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply 243271 with more fixes; although we are not handling multiple
sources with coalescable copies, we were not properly skipping this
case.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 243486
2015-07-28 21:45:50 +00:00
Vasileios Kalintiris 9876946aee [mips][FastISel] Fix call lowering by bailing out on "fastcc" calls.
Summary:
Currently, we support only the MIPS O32 ABI calling convention for call
lowering. With this change we avoid using the O32 calling convetion for
lowering calls marked as using the fast calling convention.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11515

llvm-svn: 243485
2015-07-28 21:43:31 +00:00
Vasileios Kalintiris 9ec6114860 [mips][FastISel] Fix generated code for IR's select instruction.
Summary:
Generate correct code for the select instruction by zero-extending
it's boolean/condition operand to GPR-width. This is necessary because
the conditional-move instructions operate on the whole register.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11506

llvm-svn: 243469
2015-07-28 19:57:25 +00:00
Matt Arsenault 7227cc1a48 AMDGPU: Don't try to use LDS/vector for private if pointer value stored
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.

llvm-svn: 243462
2015-07-28 18:47:00 +00:00
Matt Arsenault fdcd39a8ad AMDGPU: Fix crash if called function is a bitcast
getCalledFunction() is null, so this would crash. Replace
crash with an error on unsupported call.

llvm-svn: 243461
2015-07-28 18:29:14 +00:00
Matt Arsenault 916cea5682 AMDGPU: Fix return type of getImplicitParameterOffset.
Patch by Zoltan Gilian <zoltan.gilian@gmail.com>

llvm-svn: 243459
2015-07-28 18:09:55 +00:00
JF Bastien ae7eebd429 WebAssembly: MCAsmInfo only has one syntax variant for now.
Summary: MCAsmInfo is set up with the default AssemblerDialect, which is zero.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11567

llvm-svn: 243452
2015-07-28 17:23:07 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Geoff Berry c573bf7a5f [AArch64] Match float round and convert to int instructions.
Summary:
Add patterns for doing floating point round with various rounding modes
followed by conversion to int as a single FCVT* instruction.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11424

llvm-svn: 243422
2015-07-28 15:24:10 +00:00
Adhemerval Zanella 7bc3319d84 Implement __builtin_thread_pointer
This path add the aarch64 lowering of __builtin_thread_pointer.  It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.

llvm-svn: 243412
2015-07-28 13:03:31 +00:00
Michael Kuperstein cba308cf96 [X86] Remove mergeSPUpdatesUp()
X86FrameLowering has both a mergeSPUpdates() that accepts a direction, and an
mergeSPUpdatesUp(), which seem to do the same thing, except for a slightly 
different interface. Removed the less general function.
NFC.

Differential Revision: http://reviews.llvm.org/D11510

llvm-svn: 243396
2015-07-28 08:56:13 +00:00
Simon Pilgrim df984f58ad [X86][SSE] Use bitmasks instead of shuffles where possible.
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.

Split off from D11518.

Differential Revision: http://reviews.llvm.org/D11541

llvm-svn: 243395
2015-07-28 08:54:41 +00:00
Igor Breger 8352a0ddf2 AVX512: Implemented encoding and intrinsics for VGETEXPSS/D instructions
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11528

llvm-svn: 243390
2015-07-28 06:53:28 +00:00
Sanjay Patel 8c13e3680d fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25

The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding 
legally.

Differential Revision: http://reviews.llvm.org/D11477

llvm-svn: 243361
2015-07-28 00:48:32 +00:00
JF Bastien 088c47ee5b WebAssembly: add a generic CPU
Summary: WebAssemblySubtarget.cpp expects a default 'generic' CPU to exist, and this seems to be prevalent with other targets. It makes sense to have something between MVP and bleeding-edge, even though for now it's the same as MVP. This removes a warning that's currently generated.

Subscribers: jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11546

llvm-svn: 243345
2015-07-27 23:25:54 +00:00
JF Bastien 6c6efa1786 WebAssembly: more MCAsmInfo nits.
Summary: As suggested by sunfish.

Subscribers: jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11544

llvm-svn: 243339
2015-07-27 22:40:31 +00:00
Alexandros Lamprineas 4ea707555a - Added support for parsing HWDiv features using Target Parser.
- Architecture extensions are represented as a bitmap.

Phabricator: http://reviews.llvm.org/D11457
llvm-svn: 243335
2015-07-27 22:26:59 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Sanjay Patel 1cf245fd96 remove unnecessary forward declaration; NFC
llvm-svn: 243328
2015-07-27 21:11:55 +00:00
Sanjay Patel aa99a2304d don't repeat function names in comments; NFC
llvm-svn: 243327
2015-07-27 21:03:03 +00:00
JF Bastien 1a12bf1aa2 WebAssembly: minor MCAsmInfo fixes
Summary:
Fix pointer / callee-save stack sto size.
Update comment character to be LISP-ish.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11537

llvm-svn: 243326
2015-07-27 20:46:51 +00:00
Bruno Cardoso Lopes b20841df44 Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources"
Still breaks some ARM buildbots. This reverts r243271.

llvm-svn: 243318
2015-07-27 20:26:04 +00:00
Akira Hatanaka 2541e0241c [AArch64] Remove check for Darwin that was needed to decide if x18 should
be reserved.

The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.

llvm-svn: 243308
2015-07-27 19:18:47 +00:00
Diego Novillo cd973c4f77 Fix ODR violation. NFC.
There is an ODR conflict between lib/ExecutionEngine/ExecutionEngineBindings.cpp
and lib/Target/TargetMachineC.cpp. The inline definitions should simply
be marked static (thanks dblaikie for the hint).

llvm-svn: 243298
2015-07-27 18:27:23 +00:00
Marek Olsak 93df060871 AMDGPU: don't match vgpr loads for constant loads
Author: Dave Airlie <airlied@redhat.com>

In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.

this should be applicable for llvm 3.7 as well.

llvm-svn: 243294
2015-07-27 18:16:08 +00:00
Sanjay Patel beb4cffb43 fix typo and spacing; NFC
llvm-svn: 243287
2015-07-27 17:39:20 +00:00
Pete Cooper 2e20147403 Revert "Add const to some Type* parameters which didn't need to be mutable. NFC."
This reverts commit r243146.

Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.

llvm-svn: 243282
2015-07-27 17:15:24 +00:00
Bruno Cardoso Lopes 669c921bfd [PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r242295 with fixes in the implementation.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 243271
2015-07-27 14:39:46 +00:00
Silviu Baranga 7581d22512 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

llvm-svn: 243270
2015-07-27 14:39:34 +00:00
Simon Pilgrim 81accb7b27 [X86] Reordered lowerVectorShuffleAsBitMask before lowerVectorShuffleAsBlend. NFCI.
Allows us to show diffs for D11518 more clearly

llvm-svn: 243264
2015-07-27 12:37:19 +00:00
Marek Olsak 1354b87695 AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaround
This is a candidate for 3.7.

llvm-svn: 243263
2015-07-27 11:37:42 +00:00
Sean Silva e1c6b549ef Avoid using uncommon acronym "MSROM".
llvm-svn: 243256
2015-07-27 00:46:59 +00:00
Igor Breger f2460112ad Implemented encoding and intrinsics of the following instructions
vunpckhps/pd, vunpcklps/pd, 
  vpunpcklbw, vpunpckhbw, vpunpcklwd, vpunpckhwd, vpunpckldq, vpunpckhdq, vpunpcklqdq, vpunpckhqdq
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11509

llvm-svn: 243246
2015-07-26 14:41:44 +00:00
Juergen Ributzka 6364985b58 [AArch64][FastISel] Always use an AND instruction when truncating to non-legal types.
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.

This commit fixes this and adds the missing i32 truncate tests.

This fixes rdar://problem/21990703.

llvm-svn: 243198
2015-07-25 02:16:53 +00:00
Eric Christopher f0024d14f1 Fix PPCMaterializeInt to check the size of the integer based on the
extension property we're requesting - zero or sign extended.

This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.

llvm-svn: 243192
2015-07-25 00:48:08 +00:00
Eric Christopher 03df7ac8a9 PPCMaterializeInt should only take a ConstantInt so represent this in the prototype
and fix up all uses.

llvm-svn: 243191
2015-07-25 00:48:06 +00:00
Akira Hatanaka 0d4c9ea6e0 [AArch64] Define subtarget feature "reserve-x18", which is used to decide
whether register x18 should be reserved.

This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.

Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11463

llvm-svn: 243186
2015-07-25 00:18:31 +00:00
Pete Cooper 7679afda82 Use make_range(rbegin(), rend()) to allow foreach loops. NFC.
Instead of the pattern

for (auto I = x.rbegin(), E = x.end(); I != E; ++I)

we can use make_range to construct the reverse range and iterate using
that instead.

llvm-svn: 243163
2015-07-24 21:13:43 +00:00
Pete Cooper 098f7c1fcb Add const to some Type* parameters which didn't need to be mutable. NFC.
We were only getting the size of the type which doesn't need to modify
the type.

llvm-svn: 243146
2015-07-24 19:19:26 +00:00
Pete Cooper 0debbdc872 Use foreach loops for StructType::elements(). NFC.
We had a few places where we did

for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {

but those could instead do

for (auto *EltTy : STy->elements()) {

llvm-svn: 243136
2015-07-24 18:55:49 +00:00
Igor Breger 074a64e72c AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic

Differential Revision: http://reviews.llvm.org/D11218

llvm-svn: 243122
2015-07-24 17:24:15 +00:00
Mehdi Amini 26d481311a Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.

This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11103

(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243114
2015-07-24 16:04:22 +00:00
Sanjay Patel 0495dbf1e1 fix wrong comment; NFC
llvm-svn: 243113
2015-07-24 16:02:14 +00:00
Luke Cheeseman 4d45ff2b87 [ARM] - Fix lowering of shufflevectors in AArch32
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.

This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.

This patch fixes this by accommodating for both cases and adds regression tests.

Differential Revision: http://reviews.llvm.org/D11407

llvm-svn: 243103
2015-07-24 09:57:05 +00:00
Luke Cheeseman b5c627aba8 When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.

Differential Revision: http://reviews.llvm.org/D11408

llvm-svn: 243100
2015-07-24 09:31:48 +00:00
Mehdi Amini 5d8e569926 Revert "Remove access to the DataLayout in the TargetMachine"
This reverts commit 0f720d984f419c747709462f7476dff962c0bc41.

It breaks clang too badly, I need to prepare a proper patch for clang
first.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243089
2015-07-24 03:36:55 +00:00
Alexei Starovoitov 01886a05b8 [bpf] initial support for debug_info
llvm-svn: 243087
2015-07-24 03:17:08 +00:00
Mehdi Amini b4bc424c9a Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.

This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11103

(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243083
2015-07-24 01:44:39 +00:00
Lawrence Hu 687097a0a9 test commit, only added one space
llvm-svn: 243070
2015-07-23 23:55:28 +00:00
David Gross d9c1bc9955 [ARM] Register (existing) ARMLoadStoreOpt pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass.

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11373

llvm-svn: 243052
2015-07-23 22:12:46 +00:00
David Gross 2ad5d173ce Test commit.
llvm-svn: 243046
2015-07-23 21:46:09 +00:00
Duncan P. N. Exon Smith d531322149 X86: Use dyn_cast instead of isa+cast, NFC
llvm-svn: 243034
2015-07-23 19:27:07 +00:00
Weiming Zhao b33a5557f4 This patch eanble register coalescing to coalesce the following:
%vreg2<def> = MOVi32imm 1; GPR32:%vreg2
  %W1<def> = COPY %vreg2; GPR32:%vreg2
into:
  %W1<def> = MOVi32imm 1
Patched by Lawrence Hu (lawrence@codeaurora.org)

llvm-svn: 243033
2015-07-23 19:24:53 +00:00
Michael Kuperstein 454d145395 [X86] Allow load folding into PUSH instructions
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of 
the push instructions did not have the right mayLoad/mayStore flags.

Differential Revision: http://reviews.llvm.org/D11340

llvm-svn: 243010
2015-07-23 12:23:45 +00:00
Michael Kuperstein ffcc7663a2 [X86] Fix order of operands for ins and outs instructions when parsing intel syntax
Patch by: marina.yatsina@intel.com
Differential Revision: http://reviews.llvm.org/D11337

llvm-svn: 243001
2015-07-23 10:23:48 +00:00
Elena Demikhovsky 482b303254 X86: Fixed assertion failure in 32-bit mode
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.

Differential Revision: http://reviews.llvm.org/D11413

llvm-svn: 242994
2015-07-23 08:25:23 +00:00
Chandler Carruth fe414353db Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..."
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.

It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.

llvm-svn: 242992
2015-07-23 08:03:44 +00:00
Igor Breger da1b2ea955 AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic

Differential Revision: http://reviews.llvm.org/D11218

llvm-svn: 242990
2015-07-23 07:39:21 +00:00
Igor Breger 87e6397fb1 AVX : Fix ISA disabling in case AVX512VL , some instructions should be disabled only if AVX512BW and AVX512VL present.
Tests added.

Differential Revision: http://reviews.llvm.org/D11414

llvm-svn: 242987
2015-07-23 07:11:14 +00:00
Jingyue Wu 6a3fdeca22 [NVPTX] run LSR before straight-line optimizations
Summary:
Straight-line optimizations can simplify the loop body and make LSR's
cost analysis more precise. This significantly improves several Eigen3
CUDA benchmarks.

With this change, EigenContractionKernel runs up to 40% faster
(753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502).
EigenConvolutionKernel2D runs up to 10% faster
(753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605).

I have some difficulties writing small tests that benefit from this
reordering due to a seemingly issue with LSR (being discussed at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html).

See the review thread for the compilation time impact of GVN. 

Reviewers: eliben, jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11304

llvm-svn: 242982
2015-07-23 04:59:07 +00:00
Sanjay Patel efad9eb914 fix typo; NFC
llvm-svn: 242947
2015-07-22 21:56:41 +00:00