Commit Graph

14195 Commits

Author SHA1 Message Date
JF Bastien 3428ed4f53 WebAssembly: don't omit dead vregs from locals
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.

We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!

Reviewers: binji, sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13839

llvm-svn: 250594
2015-10-17 00:25:38 +00:00
JF Bastien 4f43e80ece WebAssembly: fix the syntax for comparisons
Summary: It has also slightly changed.

Reviewers: binji

Subscribers: jfb, dschuff, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D13837

llvm-svn: 250591
2015-10-17 00:12:29 +00:00
Joseph Tremoulet 55b51e9dcc [WinEH] Fix eh.exceptionpointer intrinsic lowering
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13747

llvm-svn: 250588
2015-10-17 00:08:08 +00:00
Reid Kleckner 28e490342b [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Sanjay Patel bbd524496c [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Joseph Tremoulet d11a998e81 [WinEH] Fix CatchRetSuccessorColorMap accounting
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.

Reviewers: majnemer, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D13798

llvm-svn: 250552
2015-10-16 21:22:54 +00:00
Andrew Kaylor 09b39acc03 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Krzysztof Parzyszek a7c5f0409c [Hexagon] Split double registers
llvm-svn: 250549
2015-10-16 20:38:54 +00:00
Krzysztof Parzyszek 5b7dd0cdf9 [Hexagon] Merge adjacent stores
llvm-svn: 250542
2015-10-16 19:43:56 +00:00
JF Bastien 6126d2b883 WebAssembly: fix load/store syntax
Summary: The syntax has changed a bit recently.

Reviewers: binji

Subscribers: llvm-commits, jfb, sunfish, dschuff

Differential Revision: http://reviews.llvm.org/D13821

llvm-svn: 250535
2015-10-16 18:24:42 +00:00
Joseph Tremoulet 53e9cbd95a [WinEH] Fix endpad coloring/numbering
Summary:
When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop
trying to propagate the cleanup's parent's color to the catchendpad, since
what's needed is the cleanup's grandparent's color and the catchendpad
will get that color from the catchpad linkage already.  We already had
this exclusion for invokes, but were missing it for
cleanupendpad/cleanupret.

Also add a missing line that tags cleanupendpads' states in the
EHPadStateMap, without with lowering invokes that target cleanupendpads
which unwind to other handlers (and so don't have the -1 state) will fail.

This fixes the reduced IR repro in PR25163.


Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13797

llvm-svn: 250534
2015-10-16 18:08:16 +00:00
Charlie Turner 434d4599d4 [AArch64] Implement vector splitting on UADDV.
Summary: Fixes PR25056.

Reviewers: mcrosier, junbuml, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13466

llvm-svn: 250520
2015-10-16 15:38:25 +00:00
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
JF Bastien 1d20a5e9e8 WebAssembly: update syntax
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.

This, along with wasmate changes, allows me to do the following:

  echo "int add(int a, int b) { return a + b; }" > add.c
  ./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
  ./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
  ./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
  ./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"

As you'd expect, the d8 shell prints out the right value.

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13712

llvm-svn: 250480
2015-10-16 00:53:49 +00:00
JF Bastien 2cdd5e4710 x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
Kevin B. Smith 89760f04f9 Change test to use FileCheck rather than grep.
Differential Revision: http://reviews.llvm.org/D13751

llvm-svn: 250431
2015-10-15 17:05:12 +00:00
JF Bastien 5b327712b0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Daniel Sanders 8008de5551 [mips][mips16] MIPS16 is not a CPU/Architecture but is an ASE.
Summary:
The -mcpu=mips16 option caused the Integrated Assembler to crash because
it couldn't figure out the architecture revision number to write to the
.MIPS.abiflags section. This CPU definition has been removed because, like
microMIPS, MIPS16 is an ASE to a base architecture.

Reviewers: vkalintiris

Subscribers: rkotler, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13656

llvm-svn: 250407
2015-10-15 14:34:23 +00:00
Igor Breger d7bae451de AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
Igor Breger b4bb190eed AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky ecff21b297 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Andrea Di Biagio 6a61cecef0 [x86] Merge test pr24562.ll into x86-fold-pshufb.ll. NFC.
llvm-svn: 250387
2015-10-15 09:54:25 +00:00
Akira Hatanaka 8ad7399f8e [MachO] Stop generating *coal* sections.
Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250370
2015-10-15 05:28:38 +00:00
Quentin Colombet 5084e44d71 [ARM] Make sure we do not dereference the end iterator when accessing debug
information.
Although the problem was always here, it would only be exposed when
shrink-wrapping is enable.

rdar://problem/23110493

llvm-svn: 250352
2015-10-15 00:41:26 +00:00
Akira Hatanaka 276332b47f Revert r250349.
Test case coal-sections-powerpc.s is still failing on some buildbots.

llvm-svn: 250351
2015-10-15 00:11:03 +00:00
Akira Hatanaka 1cea644114 [MachO] Stop generating *coal* sections.
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250349
2015-10-14 23:48:10 +00:00
Akira Hatanaka d58d347e42 Revert r250342.
Investigate why coal-sections-powerpc.s is failing on some buildbots.

llvm-svn: 250346
2015-10-14 23:29:10 +00:00
Akira Hatanaka c078ae3e4f [MachO] Stop generating *coal* sections.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250342
2015-10-14 22:45:36 +00:00
Sanjay Patel e2b528074d add x86 codegen tests for 'add nsw' followed by 'sext'
llvm-svn: 250332
2015-10-14 21:47:03 +00:00
Bill Schmidt 048cc97fb1 [PowerPC] Fix invalid lxvdsx optimization (PR25157)
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction.  That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not.  This corrects that problem.

Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.

llvm-svn: 250324
2015-10-14 20:45:00 +00:00
Andrea Di Biagio c47edbef4c [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
Joseph Tremoulet 28c89bbb36 [WinEH] Add CoreCLR EH table emission
Summary:
Emit the handler and clause locations immediately after the standard
xdata.
Clauses are emitted in the same order and format used to communiate them
to the CLR Execution Engine.
Add a lit test to verify correct table generation on a small but
interesting example function.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13451

llvm-svn: 250219
2015-10-13 20:18:27 +00:00
Matt Arsenault e5d9515fb7 DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138
2015-10-13 00:49:00 +00:00
JF Bastien 986ed68eed x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Matt Arsenault 61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Cong Hou bf22f5063a Assign correct edge weights to unwind destinations when lowering invoke statement.
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.

Differential revision: http://reviews.llvm.org/D13354

llvm-svn: 250119
2015-10-12 23:02:58 +00:00
Reid Kleckner 4a5f35c0ae Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Andrea Di Biagio b0fe4eb199 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Jun Bum Lim 54f3ddfbe2 [AArch64]Fix bug in function names in test case
Functions in this test case need to be renamed as its names are the same
as the instructions we are comparing with.

llvm-svn: 250052
2015-10-12 15:34:52 +00:00
Daniel Sanders 332cef6c5f [mips] Whitespace cleanup in MIPS16 tests to reduce noise in following changes. NFC.
Mostly tabs -> spaces and double spacing.

llvm-svn: 250041
2015-10-12 14:16:52 +00:00
Daniel Sanders 2fb8564d99 [mips] Handle undef when extracting subregs from FP64 registers.
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.

Reviewers: vkalintiris

Subscribers: llvm-commits, zoran.jovanovic, petarj

Differential Revision: http://reviews.llvm.org/D13467

llvm-svn: 250039
2015-10-12 13:55:44 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio a0922ed8fe [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Simon Pilgrim d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Simon Pilgrim bdbf839a3b [X86][SSE] Vector signed/unsigned integer compare tests.
llvm-svn: 249954
2015-10-10 22:21:05 +00:00
Jonas Paulsson 28fa48de32 [SystemZ] CodeGen/SystemZ/asm-18.ll run with -verify-machineinstrs
Relates to the fixes of r249811.

llvm-svn: 249946
2015-10-10 07:20:23 +00:00
Jonas Paulsson 63a2b6862e [SystemZ] Fixes in the backend I/R.
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.

SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.

Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.

Reviewed by Ulrich Weigand.

llvm-svn: 249945
2015-10-10 07:14:24 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
Reid Kleckner eb7cd6c889 [SEH] Update SEH codegen tests to use the new IR
Also Fix a buglet where SEH tables had ranges that spanned funclets.

The remaining tests using the old landingpad IR are preparation tests,
and will be deleted along with the old preparation.

llvm-svn: 249917
2015-10-09 23:05:54 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight 692e037499 Fix assert when emitting llvm.pow.f86.
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.

The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.

llvm-svn: 249908
2015-10-09 21:36:19 +00:00
Reid Kleckner e1c8a7f9c7 [SEH] Fix _except_handler4 table base states
We got them right for the old IR, but not with funclets.  Port the old
test to the new IR and fix the code.

llvm-svn: 249906
2015-10-09 21:27:28 +00:00
Reid Kleckner d880dc7509 [SEH] Remember to emit the last invoke range for SEH
This wasn't very observable in execution tests, because usually there is
an invoke in the catchpad that unwinds the the catchendpad but never
actually throws.

llvm-svn: 249898
2015-10-09 20:39:39 +00:00
James Y Knight 5b8217bc05 Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Dan Gohman ee1588ce96 [WebAssembly] Rename floating-point operators to match their spec names.
llvm-svn: 249859
2015-10-09 17:50:00 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Nemanja Ivanovic d389657399 Vector element extraction without stack operations on Power 8
This patch corresponds to review:
http://reviews.llvm.org/D12032

This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:

    - Vector element extraction for all vector types with constant element number
    - Vector element extraction for v16i8 and v8i16 with variable element number
    - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
      unnecessarily moving things around between registers

Not included in this patch (will be in upcoming patch):

    - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
      variable element number
    - Vector element insertion for variable/constant element number

Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.

llvm-svn: 249822
2015-10-09 11:12:18 +00:00
Saleem Abdulrasool 1825fac3c9 ARM: tweak WoA frame lowering
Accept r11 when targeting Windows on ARM rather than just low registers.
Because we are in a thumb-2 only mode, this may be slightly more expensive in
code size, but results in better code for the environment since it spills the
frame register, which is generally desired for fast stack walking as per the
ABI.

llvm-svn: 249804
2015-10-09 03:19:03 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Joseph Tremoulet 676e5cf07f [WinEH] Fix cleanup state numbering
Summary:
 - Recurse from cleanupendpads to their cleanuppads, to make sure the
   cleanuppad is visited if it has a cleanupendpad but no cleanupret.
 - Check for and avoid double-processing cleanuppads, to allow for them to
   have multiple cleanuprets (plus cleanupendpads).
 - Update Cxx state numbering to visit toplevel cleanupendpads and to
   recurse from cleanupendpads to their preds, to ensure we number any
   funclets in inlined cleanups.  SEH state numbering already did this.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13374

llvm-svn: 249792
2015-10-09 00:46:08 +00:00
Reid Kleckner ebef256269 [SEH] Fix llvm.eh.exceptioncode fast register allocation assertion
I called the wrong MachineBasicBlock::addLiveIn() overload.

llvm-svn: 249786
2015-10-09 00:15:13 +00:00
Eric Christopher ab2241f1b8 Remove a '#' so that we can check either form for the various targets.
llvm-svn: 249734
2015-10-08 20:18:15 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Alexei Starovoitov 87f83e6926 [bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
    to i64 constant 0.
  o For second pass of dag combiner, legalizer will run through
    each to-be-processed dag node.
  o If any new SDNode is generated and has an undef operand,
    dag combiner will put undef node, newly-generated constant-0 node,
    and any node which uses these nodes in the working list.
  o During this process, it is possible undef operand is
    generated again, and this will form an infinite loop
    for dag combiner pass2.
  o This patch allows UNDEF to be a legal type.

Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
2015-10-08 18:52:40 +00:00
Reid Kleckner b2244cb8f0 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Ulrich Weigand f4d14f781f [SystemZ] Fix another assertion failure in tryBuildVectorShuffle
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types.  This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.

When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.

llvm-svn: 249706
2015-10-08 17:46:59 +00:00
Frederic Riss 263b772bda [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger defab3c1ef AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
Michael Kuperstein 04e79329d0 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Michael Kuperstein 2b3c16ca17 Do not assert on first non-prologue instruction being a CFI directive.
llvm-svn: 249668
2015-10-08 07:48:49 +00:00
Jonas Paulsson 5d3fbd3733 [SystemZ] SystemZElimCompare pass improved.
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.

Test case fp-cmp-05.ll updated to expect optimized results now also for z13.

The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.

Reviewed by Ulrich Weigand.

llvm-svn: 249666
2015-10-08 07:40:23 +00:00
Jonas Paulsson 7c5ce10a07 [SystemZ] Use load-and-test for fp compare with 0 if vector support is present.
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.

Reviewed by Ulrich Weigand.

llvm-svn: 249664
2015-10-08 07:40:16 +00:00
Reid Kleckner 94fe836afa [WinEH] Add missing test case for llvm.eh.exceptioncode
llvm-svn: 249638
2015-10-07 23:55:06 +00:00
Reid Kleckner 97797419e6 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
David Majnemer 6af5f82c20 [WinEH] Refer to filter funclets using their symbol-table symbol
The relocation for the filter funclet will be against a symbol table
entry for a function instead of the section, making it easier to
understand what is going on.

llvm-svn: 249621
2015-10-07 21:34:00 +00:00
Reid Kleckner 70bf6bb5e6 [WinEH] Undo the effect of r249578 for 32-bit
The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack
offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll
test case, and immediately noticed that 32-bit catching stopped working.

While I'm at it, move the frame index to frame offset WinEH table logic
out of PEI.  PEI shouldn't have to know about WinEHFuncInfo. I realized
we can calculate frame index offsets just fine from the table printer.

llvm-svn: 249618
2015-10-07 21:13:15 +00:00
David Majnemer c289c9ff55 [WinEH] Remove unreachable blocks before preparation
We remove unreachable blocks because it is pointless to consider them
for coloring.  However, we still had stale pointers to these blocks in
some data structures after we removed them from the function.

Instead, remove the unreachable blocks before attempting to do anything
with the function.

This fixes PR25099.

llvm-svn: 249617
2015-10-07 21:08:25 +00:00
Joseph Tremoulet 39234fc67e [WinEH] Set NoModuleLevelChanges in clone flags
Summary:
This is necessary to keep the cloner from making bogus copies of debug
metadata attached to the IR it is cloning.
Also, avoid running RemapInstruction over all instructions in the common
case that no cloning was performed.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13514

llvm-svn: 249591
2015-10-07 19:29:56 +00:00
Kevin B. Smith 99e8c0fffb [X86]Update test to use FileCheck.
Updates this test to use FileCheck and a single llc invocation rather than
3 llc invocations and grep.

llvm-svn: 249583
2015-10-07 18:21:41 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Reid Kleckner 33bd2d99d8 [WinEH] Fix two minor issues in __CxxFrameHandler3 tables
There was an off-by-one bug in ip2state tables which manifested when one
call immediately preceded the try-range of the next. The return address
of the previous call would appear to be within the try range of the next
scope, resulting in extra destructors or catches running.

We also computed the wrong offset for catch parameter stack objects. The
offset should be from RSP, not from RBP.

llvm-svn: 249578
2015-10-07 17:49:32 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Michael Kuperstein 259f1508f0 [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Eric Christopher ab2802c58f Update test to use FileCheck and clean up run lines to match the
expected behavior.

llvm-svn: 249498
2015-10-07 01:21:49 +00:00
Matt Arsenault 284192730a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Reid Kleckner 72ba70418f [SEH] Add llvm.eh.exceptioncode intrinsic
This will support the Clang __exception_code intrinsic.

llvm-svn: 249492
2015-10-07 00:27:33 +00:00
David Majnemer 7735a6d07a [WinEH] Create a separate MBB for funclet prologues
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination.  This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.

Instead, use the CatchPad's location for the funclet prologue.  This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.

Differential Revision: http://reviews.llvm.org/D13424

llvm-svn: 249483
2015-10-06 23:31:59 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Chad Rosier cb14dd0265 [ARM] Simplify tests and make checks more rigid. NFC.
llvm-svn: 249432
2015-10-06 17:54:12 +00:00
Krzysztof Parzyszek fb33824efd [Hexagon] Add an early if-conversion pass
llvm-svn: 249423
2015-10-06 15:49:14 +00:00
Daniel Sanders 1b3341724c [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet

Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
  0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
    0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...

There was problem with selecting sqrt instruction in LLVM backend.

To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.

Patch by Zlatko Buljan

Reviewers: zoran.jovanovic, hvarga, dsanders

Subscribers: llvm-commits, petarj

Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249416
2015-10-06 15:17:25 +00:00
Daniel Sanders add9057fa7 Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
The author was not credited and most of the commit message is missing. Will re-commit with this fixed.

llvm-svn: 249415
2015-10-06 15:13:16 +00:00
Craig Topper 2c4068f409 [TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute.
llvm-svn: 249378
2015-10-06 05:39:59 +00:00
Alexei Starovoitov 4e01a38da0 [bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
  int pid;
  char name[16];
};
extern void test1(char *);
int test() {
  struct key_t key = {};
  test1(key.name);
  return 0;
}
For key.name, the llc/bpf may generate the below code:
  R1 = R10  // R10 is the frame pointer
  R1 += -24 // framepointer adjustment
  R1 |= 4   // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.

This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
  R1 = R10
  R1 += -20

Patch by Yonghong Song <yhs@plumgrid.com>

llvm-svn: 249371
2015-10-06 04:00:53 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Dan Gohman e51c058ecc [WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.

llvm-svn: 249364
2015-10-06 00:27:55 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Simon Pilgrim bb01c6fda2 [X86][SSE4A] Added shuffle decode tests for 'special case' SSE4A EXTRQI/INSERTQI ops.
llvm-svn: 249263
2015-10-04 10:12:53 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
David Majnemer 161935520d [WinEH] Permit branch folding in the face of funclets
Track which basic blocks belong to which funclets.  Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.

llvm-svn: 249257
2015-10-04 02:22:52 +00:00
Simon Pilgrim dde63374c5 [DAGCombiner] Generalize FADD constant combines to work with vectors
Updated the FADD combines to work with vectors as well as scalars.

Differential Revision: http://reviews.llvm.org/D13416

llvm-svn: 249251
2015-10-03 22:06:06 +00:00
Sanjay Patel 004ea240ad add test cases that demonstrate bad behavior
These are based on PR25016 and likely caused by a bug in
MachineCombiner's definition of improvesCriticalPathLen().

llvm-svn: 249249
2015-10-03 20:52:55 +00:00
Simon Pilgrim 93ea954e6d [X86][SSE] Add FADD combine tests.
llvm-svn: 249240
2015-10-03 18:17:43 +00:00
Dan Gohman dc51b96b7f [WebAssembly] Implement the remaining conversion operations.
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.

llvm-svn: 249225
2015-10-03 02:10:28 +00:00
Dan Gohman 6a050f30de [WebAssembly] Rename setlocal to set_local to match the spec.
llvm-svn: 249218
2015-10-03 00:01:53 +00:00
Dan Gohman eb440092c9 [WebAssembly] Update this test for the new loop scheme.
llvm-svn: 249217
2015-10-02 23:54:03 +00:00
Dan Gohman e3e4a5ff52 [WebAssembly] Fix CFG stackification of nested loops.
llvm-svn: 249187
2015-10-02 21:11:36 +00:00
Dan Gohman 9cc692b06e [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.
llvm-svn: 249184
2015-10-02 20:54:23 +00:00
Richard Trieu e0129e474d Call the correct overload.
Call the correct overload so a string literal does not get converted to a bool.
Also fix the test case to match the names given.

llvm-svn: 249183
2015-10-02 20:52:14 +00:00
Dan Gohman baba8c648b [WebAssembly] Add a resize_memory intrinsic.
llvm-svn: 249178
2015-10-02 20:10:26 +00:00
Dan Gohman 72f1692a2c [WebAssembly] Add a memory_size intrinsic.
llvm-svn: 249171
2015-10-02 19:21:15 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Zoran Jovanovic 9ffdfa5986 [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249123
2015-10-02 13:06:02 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
David Majnemer 4600c06434 [WinEH] Stop BranchFolding from merging across funclets
BranchFolding would merge two funclets together, this is not OK.
Disable this and strengthen the assertion in FuncletLayout.

llvm-svn: 249069
2015-10-01 21:04:13 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
Jonas Paulsson 12629324a4 [SystemZ] Add some generic (floating point support) load instructions.
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

llvm-svn: 249046
2015-10-01 18:12:28 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
Chad Rosier 4c5a4646bf [AArch64] Remove an unnecessary run line and other cleanup. NFC.
Unscaled load/store combining has been enabled since the initial ARM64 port.  No
need for a redundance run.  Also, add CHECK-LABEL directives.

llvm-svn: 248945
2015-09-30 21:10:02 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Evgeniy Stepanov d3f544f271 [safestack] Fix a stupid mix-up in the direct-tls code path.
llvm-svn: 248863
2015-09-30 00:01:47 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 1769d8505f Fix test from r248825.
llvm-svn: 248827
2015-09-29 20:50:15 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
Dan Gohman 868e1c08d9 [WebAssembly] Rename test files to match platform naming conventions.
llvm-svn: 248783
2015-09-29 08:13:58 +00:00
Reid Kleckner c71d6275ca [WinEH] Fix ip2state table emission with funclets
Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.

llvm-svn: 248763
2015-09-28 23:56:30 +00:00
Matt Arsenault 73aa8f687a AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault e5d042cd56 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault b378f075a2 AMDGPU: Add testcases
Make sure we are testing moving users
of the moved and split SMRD loads.

llvm-svn: 248738
2015-09-28 20:54:38 +00:00
Matt Arsenault f3c91f573f AMDGPU: Cleanup test
Run instnamer on it, and rename check prefix.

This is in preparation for adding new testcases to cover
bugs on other subtargets.

llvm-svn: 248737
2015-09-28 20:54:32 +00:00
Dan Gohman 05a17aa82a [WebAssembly] Support for direct call and call_indirect.
llvm-svn: 248716
2015-09-28 16:22:39 +00:00
Hal Finkel bd582581b8 [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking
When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

llvm-svn: 248698
2015-09-28 08:02:14 +00:00
Joseph Tremoulet 09af67aba5 [EH] Create removeUnwindEdge utility
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.


Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

llvm-svn: 248677
2015-09-27 01:47:46 +00:00
Matt Arsenault 86095b8dec AMDGPU: Fix sched model for VOP2b instructions
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

llvm-svn: 248646
2015-09-26 02:25:45 +00:00
Dan Gohman d0bf981296 [WebAssembly] Rename several functions and types according to the new spec.
llvm-svn: 248644
2015-09-26 01:09:44 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Cong Hou 15ea016346 Use fixed-point representation for BranchProbability.
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:

Pros:

1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.

Cons:

1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.

One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.

Differential revision: http://reviews.llvm.org/D12603

llvm-svn: 248633
2015-09-25 23:09:59 +00:00
Matthias Braun a3b701f828 SelectionDAGDumper: Print simple operands inline.
Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.

Previously:
  t0: ch = EntryToken
    t1: i64 = Register %vreg0
  t2: i64,ch = CopyFromReg t0, t1
    t3: i64 = Constant<1>
  t4: i64 = add t2, t3
    t5: i64 = Constant<2>
  t6: i64 = add t2, t5
  t10: i64 = undef
  t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
  t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
  t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>

Now:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64 = add t2, Constant:i64<1>
  t6: i64 = add t2, Constant:i64<2>
  t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
  t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
  t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64

Differential Revision: http://reviews.llvm.org/D12567

llvm-svn: 248628
2015-09-25 22:27:02 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Matthias Braun e86bbd8979 PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint
The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.

The testcase is based on a test submitted by Kit Barton.

Differential Revision: http://reviews.llvm.org/D13176

llvm-svn: 248620
2015-09-25 21:41:40 +00:00
Tom Stellard e135ffd554 AMDGPU/SI: Use .hsatext section instead of .text for HSA
Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

llvm-svn: 248619
2015-09-25 21:41:28 +00:00
Matt Arsenault 10aa807856 PeepholeOptimizer: Remove redundant copies
If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.

This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.

llvm-svn: 248611
2015-09-25 20:22:12 +00:00
Matt Arsenault 28bd7d4afe AMDGPU: Add some more tests for literal operands
llvm-svn: 248600
2015-09-25 18:21:47 +00:00
Chad Rosier 1bbd7fb38e [AArch64] Add support for generating pre- and post-index load/store pairs.
llvm-svn: 248593
2015-09-25 17:48:17 +00:00
Matt Arsenault 4bf43d4e68 AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.

Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.

Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.

llvm-svn: 248589
2015-09-25 17:27:08 +00:00
Matt Arsenault 5f70436c49 AMDGPU: Improve accuracy of instruction rates for VOPC
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.

I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.

llvm-svn: 248582
2015-09-25 16:58:25 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Simon Pilgrim 68d0050c6a [X86][SSE2] Fix zero/any extension shuffles that don't start from the first element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

llvm-svn: 248536
2015-09-24 21:02:17 +00:00
Matt Arsenault e66621b306 AMDGPU: Add s_dcache_* instructions
llvm-svn: 248533
2015-09-24 19:52:27 +00:00
Matt Arsenault d6adfb401c AMDGPU: Add cache invalidation instructions.
These are necessary for implementing mem_fence for
OpenCL 2.0.

The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.

llvm-svn: 248532
2015-09-24 19:52:21 +00:00
Mohammad Shahid d0203cbf1c Regression Test: Deletes redundant/invalid test.
Removes absdiff_expand.ll regression test file which is invalid.

Diffrential Revision: http://reviews.llvm.org/D11678

llvm-svn: 248493
2015-09-24 14:37:25 +00:00
Mohammad Shahid 13f1dfdf2e Codegen: Fix llvm.*absdiff semantic.
Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly.

Differential Revision: http://reviews.llvm.org/D11678

llvm-svn: 248483
2015-09-24 10:35:03 +00:00
Matt Arsenault 68d938649e Introduce target hook for optimizing register copies
Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid  kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

llvm-svn: 248478
2015-09-24 08:36:14 +00:00
Matt Arsenault cab64f1c75 AMDGPU: Fix printing trailing whitespace for mubuf atomics
llvm-svn: 248472
2015-09-24 07:51:17 +00:00
Matt Arsenault c721df0478 Use new TokenFactor chain when merging stores
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.

The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.

This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.

llvm-svn: 248468
2015-09-24 07:22:38 +00:00
Matt Arsenault c8e2ce4046 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

llvm-svn: 248467
2015-09-24 07:16:37 +00:00
Tim Northover beb5bccf88 ARM: fix folding stack adjustment (again again again...)
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.

To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.

llvm-svn: 248437
2015-09-23 22:21:09 +00:00
Lawrence Hu cac0b89289 Swap loop invariant GEP with loop variant GEP to allow more LICM.
This patch changes the order of GEPs generated by Splitting GEPs
    pass, specially when one of the GEPs has constant and the base is
    loop invariant, then we will generate the GEP with constant first
    when beneficial, to expose more cases for LICM.

    If originally Splitting GEP generate the following:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 %3
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
      ...
    Now it genereates:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 1032
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
      ...

    For no-loop cases, the original way of generating GEPs seems to
    expose more CSE cases, so we don't change the logic for no-loop
    cases, and only limit our change to the specific case we are
    interested in.

llvm-svn: 248420
2015-09-23 19:25:30 +00:00
Sanjay Patel 1a6534661b [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0

into this:

xorps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248415
2015-09-23 18:33:42 +00:00
Sanjay Patel aba37553c4 [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0

into this:

orps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248409
2015-09-23 18:19:07 +00:00
Sanjay Patel df2495f331 [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars
Turn this:
   movd %xmm0, %eax
   movd %xmm1, %ecx
   andl %eax, %ecx
   movd %ecx, %xmm0

into this:
   andps %xmm1, %xmm0


This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

Differential Revision: http://reviews.llvm.org/D13065

llvm-svn: 248395
2015-09-23 17:00:06 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Cong Hou b54a72ef78 Add a test case for the fix of profile update issue when lowering switch statement.
llvm-svn: 248356
2015-09-23 00:34:56 +00:00
Matthias Braun 73e4221e6c LiveIntervalAnalysis: Avoid multiple connected liveness components
We may have subregister defs which are unused but not discovered and
cleaned up prior to liveness analysis. This creates multiple connected
components in the resulting live range which are forbidden in the
MachineVerifier because they would unnecesarily constrain the register
allocator. Rewrite those dead definitions to define a newly created
virtual register.

Differential Revision: http://reviews.llvm.org/D13035

llvm-svn: 248335
2015-09-22 22:37:44 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
Ahmed Bougacha 07a844d758 [AArch64] Emit clrex in the expanded cmpxchg fail block.
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248291
2015-09-22 17:21:44 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
Simon Pilgrim 1cad0cd3ce [X86][SSE] Match zero/any extension shuffles that don't start from the first element
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.

The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.

Differential Revision: http://reviews.llvm.org/D12561

llvm-svn: 248250
2015-09-22 08:16:08 +00:00
Simon Pilgrim 4003ed2da3 [DAGCombiner] Improve FMA support for interpolation patterns
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.

This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))

Differential Revision: http://reviews.llvm.org/D13003

llvm-svn: 248210
2015-09-21 20:32:48 +00:00
Jeroen Ketema 41681a5329 [ARM] Do not scale vext with a factor
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.

Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)

Differential Revision: http://reviews.llvm.org/D12974

llvm-svn: 248208
2015-09-21 20:28:04 +00:00
Ulrich Weigand 126caeb043 [SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS
The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there
is no legal instruction for those on SystemZ.  This could cause
LLVM internal errors.  Fixed by setting the operation action to
Expand for those opcodes.

Also added test cases for all other LLVM IR intrinsics that should
generate a library call.  (Those already work correctly since the
default operation action is fine.)

llvm-svn: 248180
2015-09-21 17:35:45 +00:00
Matt Arsenault b774834429 DAGCombiner: Replace store of FP constant after attemping store merges
If storing multiple FP constants, some subset of the stores
would be replaced with integers due to visit order, so
MergeConsecutiveStores would only partially merge
these.

llvm-svn: 248169
2015-09-21 15:59:46 +00:00
Sanjay Patel bab5d6c636 add test file ahead of any functional changes for PR22428
llvm-svn: 248123
2015-09-20 15:58:00 +00:00
Simon Pilgrim c6a553241c [X86][SSE] Intrinsics builtins test refresh. NFCI
llvm-svn: 248122
2015-09-20 15:41:35 +00:00
Igor Breger b7e1f9d680 AVX512: Implemented encoding and intrinsics for vcmpss/sd.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12593

llvm-svn: 248121
2015-09-20 15:15:10 +00:00
Asaf Badouh 2744d21fb8 [X86][AVX512] extend support in Scalar conversion
add scalar FP to Int conversion with truncation intrinsics
add scalar conversion FP32 from/to FP64 intrinsics
add rounding mode and SAE mode encoding for these intrinsics

Differential Revision: http://reviews.llvm.org/D12665

llvm-svn: 248117
2015-09-20 14:31:19 +00:00
Igor Breger 4c4cd789c9 AVX512: vsqrtss/sd encoding and intrinsics implementation.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12102

llvm-svn: 248116
2015-09-20 09:13:41 +00:00
Asaf Badouh 572bbceecc [X86][AVX512DQ] Add fpclass instruction
Differential Revision: http://reviews.llvm.org/D12931

llvm-svn: 248115
2015-09-20 08:46:07 +00:00
Michael Kuperstein 58e86bc893 [X86] Fix sitofp and uitofp instruction matching failures with long double and avx512
The operation action for i32 and i64 cannot be set to legal, as long double 
needs custom lowering.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D12372

llvm-svn: 248114
2015-09-20 08:12:17 +00:00
Igor Breger 1d55f20bee AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4
Added tests for intrinsics.

Differential Revision: http://reviews.llvm.org/D12525

llvm-svn: 248113
2015-09-20 07:18:53 +00:00
Igor Breger 0ede3cbb5c AVX512: Implement instructions encoding, lowering and intrinsics
vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4
Added tests for encoding, lowering and intrinsics.

Differential Revision: http://reviews.llvm.org/D11893

llvm-svn: 248111
2015-09-20 06:52:42 +00:00
Simon Pilgrim 27f81776ad [X86][AVX2] Use general sext IR for vpmovsx stack folding tests
llvm-svn: 248093
2015-09-19 17:04:18 +00:00
Simon Pilgrim d0448ee59f [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))

Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))

Differential Revision: http://reviews.llvm.org/D12663

llvm-svn: 248091
2015-09-19 13:22:57 +00:00
Matt Arsenault cc5d106263 AMDGPU: Add failing testcase for live interval construction
llvm-svn: 248067
2015-09-19 00:03:56 +00:00
Cong Hou d40105d321 Update edge weights properly when merging blocks in if-conversion.
In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue.

Differential Revision: http://reviews.llvm.org/D12513

llvm-svn: 248030
2015-09-18 20:22:41 +00:00
Eric Christopher a835956bda Limit the range of processors supported by ARM fast isel to v6 or
later as that's all that is tested right now.

Fixes PR24858.

llvm-svn: 248027
2015-09-18 20:08:18 +00:00
Cong Hou f9f9ffb98b Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue.
In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison.

Differential Revision: http://reviews.llvm.org/D12742

llvm-svn: 248018
2015-09-18 18:19:40 +00:00
Matthias Braun 0b7d6c14c9 SelectionDAG: Introduce PersistentID to SDNode for assert builds.
This gives us more human readable numbers to identify nodes in debug
dumps.

Before:
  0x7fcbd9700160: ch = EntryToken

  0x7fcbd985c7c8: i64 = Register %RAX

   ...

      0x7fcbd9700160: <multiple use>
    0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2]

  0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3]

    0x7fcbd985c7c8: <multiple use>
    0x7fcbd985c8f0: <multiple use>
    0x7fcbd985c8f0: <multiple use>
  0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3]

Now:
  t0: ch = EntryToken

  t5: i64 = Register %RAX

    ...

      t0: <multiple use>
    t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2]

  t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3]

    t5: <multiple use>
    t6: <multiple use>
    t6: <multiple use>
  t7: ch = RETQ t5, t6, t6:1 [ORD=3]

Differential Revision: http://reviews.llvm.org/D12564

llvm-svn: 248010
2015-09-18 17:41:00 +00:00
Geoff Berry 43ec15e57e [AArch64] Improved bitfield instruction selection.
Summary:
For bitfield insert OR matching, check both operands for larger pattern
first before checking for smaller pattern.

Add pattern for unsigned bitfield insert-in-zero done with SHL+AND.

Resolves PR21631.

Reviewers: jmolloy, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D12908

llvm-svn: 248006
2015-09-18 17:11:53 +00:00
Quentin Colombet b4c6886215 [ShrinkWrap] Refactor the handling of infinite loop in the analysis.
- Strenghten the logic to be sure we hoist the restore point out of the current
  loop. (The fixes a bug with infinite loop, added as part of the patch.)
- Walk over the exit blocks of the current loop to conver to the desired restore
  point in one iteration of the update loop.

llvm-svn: 247958
2015-09-17 23:21:34 +00:00
David Majnemer 163b7f121c [WinEH] Fix tests broken by funclet-layout
llvm-svn: 247944
2015-09-17 21:11:12 +00:00
David Majnemer 978902309a [WinEH] Add a funclet layout pass
Windows EH funclets need to be contiguous.  The FuncletLayout pass will
ensure that the funclets are together and begin with a funclet entry MBB.

Differential Revision: http://reviews.llvm.org/D12943

llvm-svn: 247937
2015-09-17 20:45:18 +00:00
Reid Kleckner 5b8a46e771 [WinEH] Make funclet return instrs pseudo instrs
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.

llvm-svn: 247936
2015-09-17 20:43:47 +00:00
Reid Kleckner b78585b3c2 Fix the test case I just committed
llvm-svn: 247905
2015-09-17 17:21:45 +00:00
Reid Kleckner ed17079b52 [WinEH] Add and use hasEHPadSuccessor instead of getLandingPadSuccessor
getLandingPadSuccessor assumes that each invoke can have at most one EH
pad successor, but WinEH invokes can have more than one. Two out of
three callers of getLandingPadSuccessor don't use the returned
landingpad, so we can make them use this simple predicate instead.

Eventually we'll have to circle back and fix SplitKit.cpp so that
register allocation works. Baby steps.

llvm-svn: 247904
2015-09-17 17:19:40 +00:00
Elena Demikhovsky 702a6adfaa AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1>
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way:

mask-2-simd , shuffle simd , simd-2-mask

Differential Revision: http://reviews.llvm.org/D12727

llvm-svn: 247876
2015-09-17 06:53:12 +00:00
Reid Kleckner 813f1b65bc [WinEH] Rip out the landingpad-based C++ EH state numbering code
It never really worked, and the new code is working better every day.

llvm-svn: 247860
2015-09-16 22:14:46 +00:00
David Majnemer 67bff0d88b [WinEHPrepare] Turn terminatepad into a cleanuppad + call + cleanupret
The MSVC doesn't really support exception specifications so let's just
turn these into cleanuppads.  Later, we might use terminatepad to more
efficiently encode the "noexcept"-ness of a function body.

llvm-svn: 247848
2015-09-16 20:42:16 +00:00
Reid Kleckner b005d281c3 [WinEH] Pull Adjectives and CatchObj out of the catchpad arg list
Clang now passes the adjectives as an argument to catchpad.

Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.

llvm-svn: 247844
2015-09-16 20:16:27 +00:00
David Majnemer 459a64aed7 [WinEHPrepare] Provide a cloning mode which doesn't demote
We are experimenting with a new approach to saving and restoring SSA
values used across funclets: let the register allocator do the dirty
work for us.

However, this means that we need to be able to clone commoned blocks
without relying on demotion.

llvm-svn: 247835
2015-09-16 18:40:37 +00:00
Reid Kleckner 84ebff4a5e [WinEH] Skip state numbering when no EH pads are present
Otherwise we'd try to emit the thunk that passes the LSDA to
__CxxFrameHandler3. We don't emit the LSDA if there were no landingpads,
so we'd end up with an assembler error when trying to write the COFF
object.

llvm-svn: 247820
2015-09-16 17:19:44 +00:00
Dan Gohman 950a13cfa3 [WebAssembly] Check in an initial CFG Stackifier pass
This pass implements a simple algorithm for conversion from CFG to
wasm's structured control flow. It doesn't yet handle multiple-entry
loops; that will be added in a future patch.

It also adds initial support for switch statements.

Differential Revision: http://reviews.llvm.org/D12735

llvm-svn: 247818
2015-09-16 16:51:30 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Michael Kuperstein d926465342 [X86] Do not generate 64-bit pops of 32-bit GPRs.
When trying emit a stack adjustments using pops, frame lowering selects an
arbitrary free GPR. It should always select one from an appropriate class...
This fixes PR24649.

Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12609

llvm-svn: 247785
2015-09-16 11:27:20 +00:00
Mehdi Amini d178f4fc89 Make the default triple optional by allowing an empty string
When building LLVM as a (potentially dynamic) library that can be linked against
by multiple compilers, the default triple is not really meaningful.
We allow to explicitely set it to an empty string when configuring LLVM.
In this case, said "target independent" tests in the test suite that are using
the default triple are disabled by matching the newly available feature
"default_triple".

Reviewers: probinson, echristo
Differential Revision: http://reviews.llvm.org/D12660

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247775
2015-09-16 05:34:32 +00:00
Quentin Colombet 7b13976254 [ShrinkWrapping] Add a test case for r247710.
llvm-svn: 247713
2015-09-15 18:51:43 +00:00
Ulrich Weigand e861e6442c [SystemZ] Fix assertion failure in tryBuildVectorShuffle
Under certain circumstances, tryBuildVectorShuffle would attempt to
create a BUILD_VECTOR node with an invalid combination of types.
This happened when one of the components of the original BUILD_VECTOR
was itself a TRUNCATE node.  That TRUNCATE was stripped off during
intermediate processing to simplify code, but when adding the node
back to the result vector, we still need it to get the type right.

llvm-svn: 247694
2015-09-15 14:27:46 +00:00
Dan Gohman 311b488d76 [WebAssembly] Implement int64-to-int32 conversion.
llvm-svn: 247649
2015-09-15 00:55:19 +00:00
Jun Bum Lim 34b9bd0435 Improve ISel using across lane min/max reduction
In vectorized integer min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :
  %svn0 = vector_shuffle %0, undef<2,3,u,u>
  %smax0 = smax %0, svn0
  %svn3 = vector_shuffle %smax0, undef<1,u,u,u>
  %sc = setcc %smax0, %svn3, gt
  %n0 = extract_vector_elt %sc, #0
  %n1 = extract_vector_elt %smax0, #0
  %n2 = extract_vector_elt $smax0, #1
  %result = select %n0, %n1, n2
becomes :
  %1 = smaxv %0
  %result = extract_vector_elt %1, 0

This change extends r246790.

llvm-svn: 247575
2015-09-14 16:19:52 +00:00
John Brawn 056e67865a [ARM] Extract shifts out of multiply-by-constant
Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when
we can do the lsl as a shifted operand and the resulting multiply constant is
simpler to generate.

Do this by doing the transformation when trying to select a shifted operand,
as that ensures that it actually turns out better (the alternative would be to
do it in PreprocessISelDAG, but we don't know for sure there if extracting the
shift would allow a shifted operand to be used).

Differential Revision: http://reviews.llvm.org/D12196

llvm-svn: 247569
2015-09-14 15:19:41 +00:00
Simon Pilgrim f8f86ab176 [X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles.
Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles.
Added shuffle decodes for 3DNow! PSWAPD shuffles.

llvm-svn: 247526
2015-09-13 11:28:45 +00:00
Elena Demikhovsky 8671fcbbd6 AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL.
KNL does not have VXORPS, VORPS for 512-bit values.
I use integer VPXOR, VPOR that actually do the same.

X86ISD::FXOR/FOR are generated as a result of FSUB combining.

Differential Revision: http://reviews.llvm.org/D12753

llvm-svn: 247523
2015-09-13 08:15:15 +00:00
Sanjay Patel 8b960d22ad [x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try)
The changes in:
test/CodeGen/X86/machine-cp.ll
are just due to scheduling differences after some logic instructions were reassociated.

llvm-svn: 247516
2015-09-12 19:47:50 +00:00
Simon Pilgrim 42c834bad5 [X86] Added i1 vector sextload tests
llvm-svn: 247509
2015-09-12 15:36:41 +00:00
Simon Pilgrim 779bcf3e3d [X86][FMA] Refreshed fma tests
llvm-svn: 247508
2015-09-12 15:33:05 +00:00
Sanjay Patel 99f7370a79 revert r247506; need to verify changes in existing tests
llvm-svn: 247507
2015-09-12 15:27:31 +00:00
Sanjay Patel 08755c7dbc [x86] enable machine combiner reassociations for 128-bit vector logical integer insts
llvm-svn: 247506
2015-09-12 14:58:04 +00:00
Simon Pilgrim 8ee50e9cff [X86][SSE] Use general sext IR for (v)pmovsx stack folding tests
llvm-svn: 247502
2015-09-12 11:45:24 +00:00
Akira Hatanaka bc497c93f5 Use function attribute "stackrealign" to decide whether stack
realignment should be forced.

With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.

Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.

Differential Revision: http://reviews.llvm.org/D11814

llvm-svn: 247450
2015-09-11 18:54:38 +00:00
David Majnemer 0e70598a5b [X86] Make sure startproc/endproc are paired
We used different conditions to determine if we should emit startproc vs
endproc.  Use the same condition to ensure that they will always be
paired.

This fixes PR24374.

llvm-svn: 247435
2015-09-11 17:34:34 +00:00
Reid Kleckner 5dbee7baef [IR] Print the label operands of a catchpad like an invoke
The rest of the EH pads are fine, since they have at most one label and
take fewer operands for the personality.

Old catchpad vs. new:
  %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9
-----
  %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)]
          to label %__except.ret.10 unwind label %catchendblock.9

llvm-svn: 247433
2015-09-11 17:27:52 +00:00
David Blaikie 2f40830dde [opaque pointer type] Add textual IR support for explicit type parameter for global aliases
update.py:
import fileinput
import sys
import re

alias_match_prefix = r"(.*(?:=|:|^)\s*(?:external |)(?:(?:private|internal|linkonce|linkonce_odr|weak|weak_odr|common|appending|extern_weak|available_externally) )?(?:default |hidden |protected )?(?:dllimport |dllexport )?(?:unnamed_addr |)(?:thread_local(?:\([a-z]*\))? )?alias"
plain = re.compile(alias_match_prefix + r" (.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|addrspacecast|\[\[[a-zA-Z]|\{\{).*$)")
cast  = re.compile(alias_match_prefix + r") ((?:bitcast|inttoptr|addrspacecast)\s*\(.* to (.*?)(| addrspace\(\d+\) *)\*\)\s*(?:;.*)?$)")
gep   = re.compile(alias_match_prefix + r") ((?:getelementptr)\s*(?:inbounds)?\s*\((?P<type>.*), (?P=type)(?:\s*addrspace\(\d+\)\s*)?\* .*\)\s*(?:;.*)?$)")

def conv(line):
  m = re.match(cast, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(gep, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(plain, line)
  if m:
    return m.group(1) + ", " + m.group(2) + m.group(3) + "*" + m.group(4) + "\n"
  return line

for line in sys.stdin:
  sys.stdout.write(conv(line))

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

llvm-svn: 247378
2015-09-11 03:22:04 +00:00
Reid Kleckner da6dcc5d92 [WinEH] Push and pop EBP for 32-bit funclets
The Win32 EH runtime caller does not preserve EBP, even though it does
preserve the CSRs (EBX, ESI, EDI) for us. The result was that each
finally funclet call would leave the frame pointer off by 12 bytes.

llvm-svn: 247348
2015-09-10 22:00:02 +00:00
James Y Knight 1f3e6af7d0 [SPARC] Switch to the Machine Scheduler.
The (mostly-deprecated) SelectionDAG-based ILPListDAGScheduler scheduler
was making poor scheduling decisions, causing high register pressure and
extraneous register spills.

Switching to the newer machine scheduler generates better code -- even
without there being a machine model defined for SPARC yet.

(Actually committing the test changes too, this time, unlike r247315)

llvm-svn: 247343
2015-09-10 21:49:06 +00:00
Reid Kleckner 7bb20bd69e Fix SEH state numbering algorithm to handle cleanupendpads
WinEHPrepare's new coloring algorithm really expects to see
cleanupendpads now, so Clang will start emitting them soon.

llvm-svn: 247341
2015-09-10 21:46:36 +00:00
David Blaikie a7970f3cf8 Tidy up some alias syntax to make explicit pointer type migration easier
llvm-svn: 247312
2015-09-10 18:03:45 +00:00
Joseph Tremoulet f3aff31401 [WinEH] Fix single-block cleanup coloring
Summary:
The coloring code in WinEHPrepare queues cleanuprets' successors with the
correct color (the parent one) when it sees their cleanuppad, and so later
when iterating successors knows to skip processing cleanuprets since
they've already been queued.  This latter check was incorrectly under an
'else' condition and so inadvertently was not kicking in for single-block
cleanups.  This change sinks the check out of the 'else' to fix the bug.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12751

llvm-svn: 247299
2015-09-10 16:51:25 +00:00
Alex Lorenz 0153e59935 Fix PR 24724 - The implicit register verifier shouldn't assume certain operand
order.

The implicit register verifier in the MIR parser should only check if the
instruction's default implicit operands are present in the instruction. It
should not check the order in which they occur.

llvm-svn: 247283
2015-09-10 14:04:34 +00:00
Igor Breger 7f69a99c54 AVX512: Implemented encoding and intrinsics for
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11802

llvm-svn: 247276
2015-09-10 12:54:54 +00:00
Silviu Baranga df9ce8408a [DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors
Summary:
The BUILD_VECTOR node will truncate its operators to match the
type. We need to take this into account when constant folding -
we need to perform a truncation before constant folding the elements.
This is because the upper bits can change the result, depending on
the operation type (for example this is the case for min/max).

This change also adds a regression test.

Reviewers: jmolloy

Subscribers: jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12697

llvm-svn: 247265
2015-09-10 10:34:34 +00:00
James Molloy 8c995a93ce [ARM] Do not use vtrn for vectorshuffle if the order is reversed
The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.

Patch by Jeroen Ketema!

llvm-svn: 247254
2015-09-10 08:42:28 +00:00
Kit Barton d3b904d440 Enable the shrink wrapping optimization for PPC64.
The changes in this patch are as follows:
  1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
  2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
  3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
      Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line

A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.

Phabricator review: http://reviews.llvm.org/D11817

llvm-svn: 247237
2015-09-10 01:55:44 +00:00
Ahmed Bougacha 05541459fa [AArch64] Match FI+offset in STNP addressing mode.
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.

Follow-up to r247234.

llvm-svn: 247236
2015-09-10 01:54:43 +00:00
Ahmed Bougacha c0ac38d584 [AArch64] Match base+offset in STNP addressing mode.
Followup to r247231.

llvm-svn: 247234
2015-09-10 01:48:29 +00:00
Ahmed Bougacha b8886b517d [AArch64] Support selecting STNP.
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.

Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.

Part of PR24086.

llvm-svn: 247231
2015-09-10 01:42:28 +00:00
Reid Kleckner 7878391208 [WinEH] Add codegen support for cleanuppad and cleanupret
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.

This small example now compiles and executes successfully on win32:
  extern "C" int printf(const char *, ...) noexcept;
  struct Dtor {
    ~Dtor() { printf("~Dtor\n"); }
  };
  void has_cleanup() {
    Dtor o;
    throw 42;
  }
  int main() {
    try {
      has_cleanup();
    } catch (int) {
      printf("caught it\n");
    }
  }

Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.

llvm-svn: 247219
2015-09-10 00:25:23 +00:00
Reid Kleckner 94b704c469 [SEH] Emit 32-bit SEH tables for the new EH IR
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.

The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.

llvm-svn: 247192
2015-09-09 21:10:03 +00:00
Dan Gohman 5e0668426c [WebAssembly] Update target datalayout strings.
llvm-svn: 247187
2015-09-09 20:54:31 +00:00
Renato Golin db7ea86bf4 Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding."
This reverts commit r247149, as it was breaking numerous buildbots of varied architectures.

llvm-svn: 247177
2015-09-09 19:44:40 +00:00
Dan Gohman f71abef701 [WebAssembly] Implement calls with void return types.
llvm-svn: 247158
2015-09-09 16:13:47 +00:00
Tom Stellard 9a197676b1 AMDGPU/SI: Fold operands through REG_SEQUENCE instructions
Summary:
This helps mostly when we use add instructions for address calculations
that contain immediates.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12256

llvm-svn: 247157
2015-09-09 15:43:26 +00:00
Igor Breger ac29a82921 AVX512: Implemented encoding and intrinsics for
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11802

llvm-svn: 247149
2015-09-09 14:35:09 +00:00
Alex Lorenz b9a68dbcae Fix PR 24633 - Handle undef values when parsing standalone constants.
llvm-svn: 247145
2015-09-09 13:44:33 +00:00
Daniel Sanders 2038747fce Fix vector splitting for extract_vector_elt and vector elements of <8-bits.
Summary:
One of the vector splitting paths for extract_vector_elt tries to lower:
    define i1 @via_stack_bug(i8 signext %idx) {
      %1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx
      ret i1 %1
    }
to:
    define i1 @via_stack_bug(i8 signext %idx) {
      %base = alloca <2 x i1>
      store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base
      %2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx
      %3 = load i1, i1* %2
      ret i1 %3
    }
However, the elements of <2 x i1> are not byte-addressible. The result of this
is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies
to '%base + %idx * 0', and then simply '%base' causing all values of %idx to
extract element zero.

This commit fixes this by promoting the vector elements of <8-bits to i8 before
splitting the vector.

This fixes a number of test failures in pocl.

Reviewers: pekka.jaaskelainen

Subscribers: pekka.jaaskelainen, llvm-commits

Differential Revision: http://reviews.llvm.org/D12591

llvm-svn: 247128
2015-09-09 09:53:20 +00:00
Dan Gohman e590b33bf8 [WebAssembly] Fix lowering of calls with more than one argument.
llvm-svn: 247118
2015-09-09 01:52:45 +00:00
Matt Arsenault acd68b58ae SelectionDAG: Support Expand of f16 extloads
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.

This moves a hack out of SI's argument lowering and
is covered by existing tests.

llvm-svn: 247113
2015-09-09 01:12:27 +00:00
Dan Gohman 4f52e00ecb [WebAssembly] Implement WebAssemblyInstrInfo::copyPhysReg
llvm-svn: 247110
2015-09-09 00:52:47 +00:00
Reid Kleckner 51189f0a1d [WinEH] Avoid creating MBBs for LLVM BBs that cannot contain code
Typically these are catchpads, which hold data used to decide whether to
catch the exception or continue unwinding. We also shouldn't create MBBs
for catchendpads, cleanupendpads, or terminatepads, since no real code
can live in them.

This fixes a problem where MI passes (like the register allocator) would
try to put code into catchpad blocks, which are not executed by the
runtime. In the new world, blocks ending in invokes now have many
possible successors.

llvm-svn: 247102
2015-09-08 23:28:38 +00:00
Reid Kleckner df1295173f [WinEH] Emit prologues and epilogues for funclets
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.

64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12546

llvm-svn: 247092
2015-09-08 22:44:41 +00:00
Eric Christopher 71f6e2f568 Fix the PPC CTR Loop pass to look for calls to the intrinsics that
read CTR and count them as reading the CTR.

llvm-svn: 247083
2015-09-08 22:14:58 +00:00
Derek Schuff 45c832c5d8 Fix comments and RUN line in x86-64 stdarg test leftover from last commit
From http://reviews.llvm.org/D12346

llvm-svn: 247070
2015-09-08 20:58:41 +00:00
Derek Schuff eef533f422 x32. Fixes a bug in how struct va_list is initialized in x32
Summary: This patch modifies X86TargetLowering::LowerVASTART so that
struct va_list is initialized with 32 bit pointers in x32. It also
includes tests that call @llvm.va_start() for x32.

Patch by João Porto

Subscribers: llvm-commits, hjl.tools
Differential Revision: http://reviews.llvm.org/D12346

llvm-svn: 247069
2015-09-08 20:51:31 +00:00
Derek Schuff ee4e947e23 x32. Fixes a bug in i8mem_NOREX declaration.
The old implementation assumed LP64 which is broken for x32.  Specifically, the
MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit
physreg copy instruction' error message to be reported.

This patch also enable the h-register*ll tests for x32.

Differential Revision: http://reviews.llvm.org/D12336

Patch by João Porto

llvm-svn: 247058
2015-09-08 19:47:15 +00:00
Matt Arsenault 966a94f861 AMDGPU: Handle sub of constant for DS offset folding
sub C, x - > add (sub 0, x), C for DS offsets.

This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.

llvm-svn: 247054
2015-09-08 19:34:22 +00:00
David Blaikie 12dd3c4ebb Fix CPP Backend for GEP API changes for opaque pointer types
Based on a patch by Jerome Witmann.

llvm-svn: 247047
2015-09-08 18:42:29 +00:00
JF Bastien 749ed88aa5 WebAssembly: NFC rename shr/sar
Renamed from: https://github.com/WebAssembly/design/pull/332

llvm-svn: 247028
2015-09-08 17:21:21 +00:00
Dan Gohman d4a12d25ff [WebAssembly] Temporarily disable this test, as it depends on additional patches that aren't yet checked in.
llvm-svn: 247011
2015-09-08 13:21:12 +00:00
Igor Breger a54a1a84dd AVX512: kunpck encoding implementation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D12061

llvm-svn: 247010
2015-09-08 13:10:00 +00:00
Dan Gohman 25d2a0dda4 [WebAssembly] Enable SSA lowering and other pre-regalloc passes
llvm-svn: 247008
2015-09-08 12:39:25 +00:00
Daniel Sanders 808dfb8ba7 [mips] Reserve address spaces 1-255 for software use.
Summary: And define them to have noop casts with address spaces 0-255.

Reviewers: pekka.jaaskelainen

Subscribers: pekka.jaaskelainen, llvm-commits

Differential Revision: http://reviews.llvm.org/D12678

llvm-svn: 246990
2015-09-08 09:07:03 +00:00
Elena Demikhovsky e88038f235 AVX-512: Lowering for 512-bit vector shuffles.
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.

Differential Revision: http://reviews.llvm.org/D10683

llvm-svn: 246981
2015-09-08 06:38:21 +00:00
Simon Pilgrim 15fc134865 [X86][MMX] Added missing stack folding tests for MMX/SSSE3 instructions
llvm-svn: 246949
2015-09-06 17:50:15 +00:00
Simon Pilgrim b38c09d7ff [X86][AVX512] Added 512-bit vector shift tests.
Only works for avx512f (dq) targets so far - need to add avx512bw tests once char/short shifts are supported.

llvm-svn: 246943
2015-09-06 13:36:32 +00:00
Hal Finkel 10aac5fd0e [SelectionDAG] Swap commutative binops before constant-based folding
In searching for a fix for the underlying code-quality bug highlighted by
r246937 (that SDAG simplification can lead to us generating an ISD::OR node
with a constant zero LHS), I ran across this:

We generically canonicalize commutative binary-operation nodes in SDAG getNode
so that, if only one operand is a constant, it will be on the RHS.  However, we
were doing this only after a bunch of constant-based simplification checks that
all assume this canonical form (that any constant will be on the RHS). Moving
the operand-swapping canonicalization prior to these checks seems like the
right thing to do (and, as it turns out, causes SDAG to completely fold away the
computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine
would do).

llvm-svn: 246938
2015-09-06 05:42:13 +00:00
Hal Finkel ccf9259c00 [PowerPC] Don't commute trivial rlwimi instructions
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.

The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.

Fixes PR24719.

llvm-svn: 246937
2015-09-06 04:17:30 +00:00
Simon Pilgrim a1ceba8ab4 [X86] Updated vector lzcnt tests. Added missing vec512 tests.
llvm-svn: 246927
2015-09-05 11:56:30 +00:00
Simon Pilgrim 5cce4381ab [X86] Updated vector tzcnt tests. Added vec512 tests.
llvm-svn: 246922
2015-09-05 10:19:07 +00:00
Simon Pilgrim ba8ae5a814 [X86] Updated vector popcnt tests. Added vec512 tests.
llvm-svn: 246921
2015-09-05 09:59:59 +00:00
Hal Finkel b1518d6c24 [PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:

  and(or(x, c1), c2)

but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.

Fixes PR24704.

llvm-svn: 246900
2015-09-05 00:02:59 +00:00
Simon Pilgrim 0fccef6ac2 [X86][AVX] Test tidyup + regeneration. NFCI.
llvm-svn: 246863
2015-09-04 19:47:56 +00:00
Steven Wu 332eeca1e4 Fix the testcase in r246790
Using generic neon syntax to avoid test failure on apple platforms.

llvm-svn: 246833
2015-09-04 01:39:24 +00:00
Hal Finkel e6702ca0e2 [PowerPC] Try harder to find a base+offset when looking for consecutive accesses
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.

The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.

llvm-svn: 246813
2015-09-03 22:37:44 +00:00
Hal Finkel 99d95328d6 [PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.

I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.

llvm-svn: 246805
2015-09-03 21:12:15 +00:00
Chad Rosier 6c36eff1d6 [AArch64] Improve ISel using across lane addition reduction.
In vectorized add reduction code, the final "reduce" step is sub-optimal.
This change wll combine :

ext  v1.16b, v0.16b, v0.16b, #8
add  v0.4s, v1.4s, v0.4s
dup  v1.4s, v0.s[1]
add  v0.4s, v1.4s, v0.4s

into

addv s0, v0.4s

PR21371
http://reviews.llvm.org/D12325
Patch by Jun Bum Lim <junbuml@codeaurora.org>!

llvm-svn: 246790
2015-09-03 18:13:57 +00:00
Quentin Colombet 10ec8c4ca9 [ARM] Add a test case for revision 243956.
llvm-svn: 246785
2015-09-03 16:49:18 +00:00
Chad Rosier 08ef462d15 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r246769.

This appears to have broken Multisource/Benchmarks/tramp3d-v4.

llvm-svn: 246782
2015-09-03 16:41:28 +00:00