Commit Graph

27846 Commits

Author SHA1 Message Date
Elena Demikhovsky fcea06acb5 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456

llvm-svn: 224764
2014-12-23 10:30:39 +00:00
Hal Finkel 6e27c6d450 [PowerPC] Don't mark the return-address slot as immutable
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.

Fixes PR21976.

llvm-svn: 224761
2014-12-23 09:45:06 +00:00
Elena Demikhovsky 3121449f0b AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.

llvm-svn: 224760
2014-12-23 09:36:28 +00:00
Michael Kuperstein f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Hal Finkel 04b16b51ec [PowerPC] Don't attempt a 64-bit pow2 division on PPC32
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.

Fixes PR21928.

llvm-svn: 224758
2014-12-23 08:38:50 +00:00
Michael Liao 5313da3263 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Ahmed Bougacha 4553bff412 [ARM] Don't break alignment when combining base updates into load/stores.
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node.  However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type).  But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.

This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.

Differential Revision: http://reviews.llvm.org/D6759

llvm-svn: 224754
2014-12-23 06:07:31 +00:00
Chandler Carruth c7d1e24b34 Revert r224739: Debug info: Teach SROA how to update debug info for
fragmented variables.

This caused codegen to start crashing when we built somewhat large
programs with debug info and optimizations. 'check-msan' hit in, and
I suspect a bootstrap would as well. I mailed a test case to the
review thread.

llvm-svn: 224750
2014-12-23 02:58:14 +00:00
Jim Grosbach 1bd0f3530e X86: Don't over-align combined loads.
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.

rdar://19190968

llvm-svn: 224746
2014-12-23 00:35:23 +00:00
Reid Kleckner ce0093344f Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

llvm-svn: 224745
2014-12-22 23:58:37 +00:00
Adrian Prantl d9e64b6c08 Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag.
Followup to r224294:

ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224743
2014-12-22 23:09:14 +00:00
Bruno Cardoso Lopes bad65c3b70 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
Adrian Prantl a47ace5901 Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as

  typedef struct { long int a; int b;} S;

  int foo(S s) {
    return s.b;
  }

which at -O1 on x86_64 is codegen'd into

  define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
    ret i32 %s.coerce1, !dbg !24
  }

with this patch we emit the following debug info for this

  TAG_formal_parameter [3]
    AT_location( 0x00000000
                 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                 AT_name( "s" )
                 AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )

Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!

llvm-svn: 224739
2014-12-22 22:26:00 +00:00
Reid Kleckner 5aabc06c16 Fix Windows unwind info for functions in sections other than .text
Previously we assumed the section name had the form .text$foo, which is
what we used to do for inline functions. If the dollar wasn't present,
we'd put unwind data in the .pdata and .xdata sections for the main
.text section, which is incorrect.

Fixes PR22001.

llvm-svn: 224738
2014-12-22 22:10:08 +00:00
Colin LeMahieu 4b1eac4dda [Hexagon] Adding memb instruction. Fixing whitespace in test from 224730.
llvm-svn: 224735
2014-12-22 21:40:43 +00:00
Colin LeMahieu af1e5de141 [Hexagon] Adding classes and load unsigned byte instruction, updating usages.
llvm-svn: 224730
2014-12-22 21:20:03 +00:00
Bruno Cardoso Lopes 811c173523 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

llvm-svn: 224725
2014-12-22 19:45:43 +00:00
Quentin Colombet 84f89ccd45 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>

llvm-svn: 224717
2014-12-22 18:11:52 +00:00
Elena Demikhovsky 949b0d46bf AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.

llvm-svn: 224707
2014-12-22 13:52:48 +00:00
Karthik Bhat bf662901c1 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754
llvm-svn: 224706
2014-12-22 13:38:58 +00:00
Rafael Espindola 2645d33977 Convert a few tests to FileCheck. NFC.
llvm-svn: 224705
2014-12-22 13:29:46 +00:00
Matt Arsenault 22b4c256e1 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

llvm-svn: 224691
2014-12-21 16:48:42 +00:00
Saleem Abdulrasool 0fa832002c ARM: further improve deprecated diagnosis (LDM)
The ARM ARM states:
  LDM/LDMIA/LDMFD:
    The SP can be in the list. However, ARM deprecates using these instructions
    with SP in the list.

    ARM deprecates using these instructions with both the LR and the PC in the
    list.

  LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED:
    The SP can be in the list. However, instructions that include the SP in the
    list are deprecated.

    Instructions that include both the LR and the PC in the list are deprecated.

  POP:
    The SP can only be in the list before ARMv7. ARM deprecates any use of ARM
    instructions that include the SP, and the value of the SP after such an
    instruction is UNKNOWN.

    ARM deprecates the use of this instruction with both the LR and the PC in
    the list.

Attempt to diagnose use of deprecated forms of these instructions.  This mirrors
the previous changes to diagnose use of the deprecated forms of STM in ARM mode.

llvm-svn: 224682
2014-12-20 20:25:36 +00:00
David Majnemer 6eed0e0d20 This should have been part of r224676.
llvm-svn: 224677
2014-12-20 04:48:34 +00:00
David Majnemer b0362e4ee6 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
David Majnemer 0b6a0b0257 InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X  into  X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX  into  X
(X & INT_MIN) ? X | INT_MIN : X  into  X
(X & INT_MIN) ? X : X | INT_MIN  into  X | INT_MIN

llvm-svn: 224669
2014-12-20 03:04:38 +00:00
Chandler Carruth b07d836577 [x86] Change the test added in r223774 to first check the spelling of
the error message for a bogus processor, and then look specifically for
that error message using FileCheck.

I actually tried to write the test this way at first, but drew a blank
on how to ensure the error message stayed in sync (oops). Now that I've
recalled how to do that, this is clearly better.

It also fixes an issue with a malloc implementation that actually prints
to stderr in all cases, which was causing problems for some builders it
seems.

llvm-svn: 224665
2014-12-20 02:19:22 +00:00
Elena Demikhovsky fb73ca516b Masked load and store codegen - fixed 128-bit vectors
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.

llvm-svn: 224647
2014-12-19 23:27:57 +00:00
Matt Arsenault dc10307524 R600/SI: Only form min/max with 1 use.
If the condition is used for something else, this increases
the number of instructions.

llvm-svn: 224646
2014-12-19 23:15:30 +00:00
Kevin Enderby 52e4ce4a53 Add printing the LC_ROUTINES load commands with llvm-objdump’s -private-headers.
llvm-svn: 224627
2014-12-19 22:25:22 +00:00
Reid Kleckner 93acac6cfc Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

llvm-svn: 224625
2014-12-19 22:19:48 +00:00
Sanjay Patel 1da5f1645b Model sqrtss as a binary operation with one source operand tied to the destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. 

Differential Revision: http://reviews.llvm.org/D6330

llvm-svn: 224624
2014-12-19 22:16:28 +00:00
Tom Stellard c3d7eeb6e5 R600/SI: Make sure non-inline constants aren't folded into mubuf soffset operand
mubuf instructions now define the soffset field using the SCSrc_32
register class which indicates that only SGPRs and inline constants
are allowed.

llvm-svn: 224622
2014-12-19 22:15:30 +00:00
Kevin Enderby 186eac3c0c Add printing the LC_SUB_CLIENT load command with llvm-objdump’s -private-headers.
llvm-svn: 224616
2014-12-19 21:06:24 +00:00
Peter Collingbourne 0e5c068592 CodeGen: do not attempt to invalidate virtual registers for zero-sized phis.
llvm-svn: 224615
2014-12-19 20:50:07 +00:00
Colin LeMahieu 0f850bde0e [Hexagon] Removing old variants of instructions and updating references.
llvm-svn: 224612
2014-12-19 20:29:29 +00:00
Sanjay Patel 0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Colin LeMahieu 38ce8cd2e2 [Hexagon] Adding bit extraction and table indexing instructions.
llvm-svn: 224610
2014-12-19 20:01:08 +00:00
Colin LeMahieu 3c7f664d5a [Hexagon] Adding bit insertion instructions.
llvm-svn: 224609
2014-12-19 19:54:38 +00:00
Colin LeMahieu d63ef93b4b [Hexagon] Adding more xtype shift instructions.
llvm-svn: 224608
2014-12-19 19:51:35 +00:00
Kevin Enderby 36c8d3ae63 Add printing the LC_SUB_LIBRARY load command with llvm-objdump’s -private-headers.
llvm-svn: 224607
2014-12-19 19:48:16 +00:00
Colin LeMahieu cc09d1ccc5 [Hexagon] Adding xtype shift instructions.
llvm-svn: 224604
2014-12-19 19:34:50 +00:00
Colin LeMahieu f3db884efb [Hexagon] Adding transfers to and from control registers.
llvm-svn: 224599
2014-12-19 19:06:32 +00:00
Bruno Cardoso Lopes f6cf8ad4e5 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes 3be15b2fa6 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes c9005f2f2b [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
Juergen Ributzka 4d7f70d47e [Object] Don't crash on empty export lists.
Summary: This fixes the exports iterator if the export list is empty.

Reviewers: Bigcheese, kledzik

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6732

llvm-svn: 224563
2014-12-19 02:31:01 +00:00
Colin LeMahieu 5ccbb1298b [Hexagon] Adding loop0/1 sp0/1/2loop0 instructions.
llvm-svn: 224556
2014-12-19 00:06:53 +00:00
David Majnemer 824e011ad7 ConstantFold: Shifting undef by zero results in undef
llvm-svn: 224553
2014-12-18 23:54:43 +00:00
Colin LeMahieu 174476ed96 Reverting 224550, was not ready for commit.
llvm-svn: 224552
2014-12-18 23:36:15 +00:00
Colin LeMahieu 9000481cda [Hexagon] Adding loop0/1 sp0/1/2loop0 instructions.
llvm-svn: 224550
2014-12-18 23:27:51 +00:00
Kevin Enderby a2bd8d98a1 Add printing the LC_SUB_UMBRELLA load command with llvm-objdump’s -private-headers.
llvm-svn: 224548
2014-12-18 23:13:26 +00:00
Kevin Enderby b4b7931748 Add printing the LC_SUB_FRAMEWORK load command with llvm-objdump’s -private-headers.
llvm-svn: 224534
2014-12-18 19:24:35 +00:00
Jozef Kolek 2f27d571c8 [mips][microMIPS] Fix bugs related to atomic SC/LL instructions
Fix bugs related to atomic microMIPS SC/LL instructions: While expanding atomic
operations the mips32r2 encoding was emitted instead of microMIPS.

Differential Revision: http://reviews.llvm.org/D6659

llvm-svn: 224524
2014-12-18 16:39:29 +00:00
Saleem Abdulrasool 0b5a8520ac ARM: fix an off-by-one in the register list access
Fix an off-by-one access introduced in 224502 for push.w and pop.w with single
register operands.  Add test cases for both scenarios.

Thanks to Asiri Rathnayake for pointing out the failure!

llvm-svn: 224521
2014-12-18 16:16:53 +00:00
Toma Tabacu 83cd33c4f6 [mips] Clean up the CodeGen/Mips/inlineasmmemop.ll test. NFC.
Summary:
Improve comments and remove a redundant attribute list.
There are no functional changes (to the CHECK's or to the code).

Part of these changes were suggested in http://reviews.llvm.org/D6637.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6705

llvm-svn: 224517
2014-12-18 13:03:51 +00:00
Robert Khasanov 79fb7292d7 [AVX512] Enable FP arithmetic lowering for AVX512VL subsets.
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. 
Added lowering tests.

llvm-svn: 224516
2014-12-18 12:28:22 +00:00
Saleem Abdulrasool 3a23917d48 ARM: improve instruction validation for thumb mode
The ARM Architecture Reference Manual states the following:
  LDM{,IA,DB}:
    The SP cannot be in the list.
    The PC can be in the list.
    If the PC is in the list:
      • the LR must not be in the list
      • the instruction must be either outside any IT block, or the last
        instruction in an IT block.
  POP:
    The PC can be in the list.
    If the PC is in the list:
      • the LR must not be in the list
      • the instruction must be either outside any IT block, or the last
        instruction in an IT block.
  PUSH:
    The SP and PC can be in the list in ARM instructions, but not in Thumb
    instructions.
  STM:{,IA,DB}:
    The SP and PC can be in the list in ARM instructions, but not in Thumb
    instructions.

llvm-svn: 224502
2014-12-18 05:24:38 +00:00
Saleem Abdulrasool 69973e0fa0 test: avoid unnecessary temporary files
Use pipes and redirect the error output to FileCheck directly.  NFC.

llvm-svn: 224501
2014-12-18 05:24:32 +00:00
Eric Christopher 661f2d1ca1 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

llvm-svn: 224492
2014-12-18 02:20:58 +00:00
Eric Christopher 1971c3508a Model ARM backend ABI selection after the front end code doing the
same. This will change the "bare metal" ABI from APCS to AAPCS.

The only difference between the front and back end code is that
the code for Triple::GNU was added for environment. That will migrate
to the front end shortly.

Tests updated with the ABI they were originally testing in the case
of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux
triples.

llvm-svn: 224489
2014-12-18 02:08:45 +00:00
Duncan P. N. Exon Smith fda0cee7c6 Reapply "Linker: Drop superseded subprograms"
This reverts commit r224416, reapplying r224389.  The buildbots hadn't
recovered after my revert, waiting until David reverted a couple of his
commits.  It looks like it was just bad timing (where we were both
modifying code related to the same assertion).  Trying again...

Here's the original text:

    When a function gets replaced by `ModuleLinker`, drop superseded
    subprograms.  This ensures that the "first" subprogram pointing at a
    function is the same one that `!dbg` references point at.

    This is a stop-gap fix for PR21910.  Notably, this fixes Release+Asserts
    bootstraps that are currently asserting out in
    `LexicalScopes::initialize()` due to the explicit instantiations in
    `lib/IR/Dominators.cpp` eventually getting replaced by -argpromotion.

llvm-svn: 224487
2014-12-18 01:05:33 +00:00
Kevin Enderby d0b6b7fb7f Add printing the LC_LINKER_OPTION load command with llvm-objdump’s -private-headers.
Also corrected the name of the load command to not end in an ’S’ as well as corrected
the name of the MachO::linker_option_command struct and other places that had the
word option as plural which did not match the Mac OS X headers.

llvm-svn: 224485
2014-12-18 00:53:40 +00:00
Matt Arsenault 303011a005 R600/SI: Fix f64 inline immediates
llvm-svn: 224458
2014-12-17 21:04:08 +00:00
Jingyue Wu e4c9cf04f5 [NVPTX] Fix bugs related to isSingleValueType
Summary:
With isSingleValueType starting to treat vector types as single-value types,
code that uses this interface needs to be updated.

Test Plan:
vector-global.ll
nvcl-param-align.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D6573

llvm-svn: 224440
2014-12-17 17:59:04 +00:00
Timur Iskhodzhanov 44ee1c0e91 Fix CR/LF line endings in test case
llvm-svn: 224437
2014-12-17 17:52:12 +00:00
Saleem Abdulrasool 1ce7d31f33 ARM: correct an off-by-one in an assert
The assert was off-by-one, resulting in failures for valid input.

Thanks to Asiri Rathnayake for pointing out the failure!

llvm-svn: 224432
2014-12-17 16:17:44 +00:00
Michael Kuperstein 047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Toma Tabacu a23f13c3b0 [mips] Set GCC-compatible MIPS asssembler options before inline asm blocks.
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.

This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).

This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6637

llvm-svn: 224425
2014-12-17 10:56:16 +00:00
Suyog Sarda 43fae93da8 Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.

llvm-svn: 224424
2014-12-17 10:34:27 +00:00
Yaron Keren f630971635 Teach lit.cfg to recognize -windows-gnu in addition to -mingw32.
llvm-svn: 224421
2014-12-17 09:55:15 +00:00
Elena Demikhovsky 028e966a54 Added 5 more tests related to sink store revision 224247
- by Ella Bolshinsky

http://reviews.llvm.org/D6420

llvm-svn: 224418
2014-12-17 08:12:59 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Duncan P. N. Exon Smith 92731d26bc Revert "Linker: Drop superseded subprograms"
This reverts commit r224389.  Based on feedback from the bots, the
assertion seems to be going off *more* often, not less (previously I was
just seeing it in an internal bootstrap, now it's happening in public
builds too).

http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/936/
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/5325

Reverting in order to investigate.

llvm-svn: 224416
2014-12-17 07:27:31 +00:00
Justin Hibbits 0c0d5deff1 Add parsing of 'foo@local".
Summary:
Currently, it supports generating, but not parsing, this expression.
Test added as well.

Test Plan: New test added, no regressions due to this.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6672

llvm-svn: 224415
2014-12-17 06:23:35 +00:00
Duncan P. N. Exon Smith f9abf4fb0c llvm-lto: Add testing coverage for local contexts
Add coverage in `llvm-lto` for the API exposed by libLTO to create
modules in local contexts.

The goal here isn't to test the symbol-related API extensively, just to
confirm that these modules work at all.  (I'll be shifting code around
soon that should be NFC and I realized there was no test coverage.)

llvm-svn: 224408
2014-12-17 02:00:38 +00:00
David Majnemer 65c52ae8ca InstSimplify: shl nsw/nuw undef, %V -> undef
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.

However, shl behaves like an identity function when the right hand side
is zero.

llvm-svn: 224405
2014-12-17 01:54:33 +00:00
Quentin Colombet fc2201e922 [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224402
2014-12-17 01:36:17 +00:00
Kevin Enderby 57538299e8 Add printing the LC_ENCRYPTION_INFO_64 load command with llvm-objdump’s -private-headers
and add tests for the two AArch64 binaries.

llvm-svn: 224400
2014-12-17 01:01:30 +00:00
David Blaikie 8b979f01c6 PR21875: codegen for non-type template parameters of nullptr_t type
llvm-svn: 224399
2014-12-17 00:43:22 +00:00
Reid Kleckner 04b69f89aa Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

llvm-svn: 224397
2014-12-17 00:29:23 +00:00
Hans Wennborg 224cb82a39 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Colin LeMahieu aa1bade7b4 [Hexagon] Updating doubleword shift usages to new versions.
llvm-svn: 224391
2014-12-16 23:36:15 +00:00
Kevin Enderby 0804f467f2 Add printing the LC_ENCRYPTION_INFO load command with llvm-objdump’s -private-headers.
llvm-svn: 224390
2014-12-16 23:25:52 +00:00
Duncan P. N. Exon Smith 8759026893 Linker: Drop superseded subprograms
When a function gets replaced by `ModuleLinker`, drop superseded
subprograms.  This ensures that the "first" subprogram pointing at a
function is the same one that `!dbg` references point at.

This is a stop-gap fix for PR21910.  Notably, this fixes Release+Asserts
bootstraps that are currently asserting out in
`LexicalScopes::initialize()` due to the explicit instantiations in
`lib/IR/Dominators.cpp` eventually getting replaced by -argpromotion.

llvm-svn: 224389
2014-12-16 23:23:41 +00:00
Sanjay Patel 494a625fee fix typo, add spaces; NFC
llvm-svn: 224384
2014-12-16 22:48:42 +00:00
Simon Pilgrim bf1e079005 [X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps)
Added a missing memory folding relationship for the (V)CVTPD2PS instruction - we can safely fold these for stack reloads.

Differential Revision: http://reviews.llvm.org/D6663

llvm-svn: 224383
2014-12-16 22:30:10 +00:00
Sanjay Patel 7129c10cae merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Kevin Enderby 1ff0ecc7a1 Fix a bug in llvm-objdump’s -private-headers for the LC_VERSION_MIN_IPHONEOS
load command not getting printed.

llvm-svn: 224376
2014-12-16 21:48:27 +00:00
Colin LeMahieu f5acc8c625 [Hexagon] Adding tstbit/bitclr/bitset instructions.
llvm-svn: 224374
2014-12-16 21:28:58 +00:00
Kostya Serebryany 7376294086 [sanitizer] prevent function call merging for sanitizer-coverage callbacks
llvm-svn: 224372
2014-12-16 21:24:15 +00:00
Colin LeMahieu 615757f2f1 [Hexagon] Adding bit count and twiddling instructions.
llvm-svn: 224367
2014-12-16 20:57:56 +00:00
Colin LeMahieu 6fce46baf6 [Hexagon] Adding asr/lsr/asl reg/imm, asl with saturation, asr with rounding. Doubleword abs/neg/not. Interleave and deinterleave instructions.
llvm-svn: 224365
2014-12-16 20:40:23 +00:00
JF Bastien 5d3280c7a7 x86-32: PUSHF/POPF use/def EFLAGS
Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386.

Test Plan: ninja check

Reviewers: t.p.northover, jvoung

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6687

llvm-svn: 224359
2014-12-16 20:15:45 +00:00
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Robert Khasanov d04cd2fbfe [AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets.
Added lowering tests.

llvm-svn: 224349
2014-12-16 18:24:07 +00:00
Ahmed Bougacha 0dc1979293 [MC] Reset the MCInst in the matcher function before adding opcode/operands.
On X86, the Intel asm parser tries to match all memory operand sizes when
none is explicitly specified.  For LEA, which doesn't really have a memory
operand (just a pointer one), this results in multiple successful matches,
one for each memory size.  There's no error because it's same opcode, so
really, it's just one match.  However, the tablegen'd matcher function
adds opcode/operands to the passed MCInst, and this results in multiple
duplicated operands.

This commit clears the MCInst in the tablegen'd matcher function.
We sometimes clear it when the match failed, so there's no expectation of
keeping the previous content anyway.

Differential Revision: http://reviews.llvm.org/D6670

llvm-svn: 224347
2014-12-16 18:05:28 +00:00
Colin LeMahieu 1944a8cd04 [Hexagon] Adding absolute value, and negate with saturation
llvm-svn: 224346
2014-12-16 17:44:49 +00:00
Sanjay Patel e46d54f0bf combine consecutive subvector 16-byte loads into one 32-byte load
This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead. 
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.

Differential Revision: http://reviews.llvm.org/D6492

llvm-svn: 224344
2014-12-16 16:30:01 +00:00
Colin LeMahieu 455f24aa77 [Hexagon] Adding saturate and swizzle instructions.
llvm-svn: 224343
2014-12-16 16:27:17 +00:00
Zoran Jovanovic 2deca34803 [mips][microMIPS] Implement SWP and LWP instructions
Differential Revision: http://reviews.llvm.org/D5667

llvm-svn: 224338
2014-12-16 14:59:10 +00:00
Vladimir Medic a489a631ae Add disassembler tests for mips4 platform. There are no functional changes.
llvm-svn: 224335
2014-12-16 13:02:25 +00:00
Elena Demikhovsky f5b72afff4 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527

llvm-svn: 224334
2014-12-16 11:50:42 +00:00
Daniel Sanders 97797a8a0f [mips] Fix arguments-struct.ll for Windows and OSX hosts.
llvm-svn: 224333
2014-12-16 11:21:58 +00:00
Bradley Smith ececb7f6e2 [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 224332
2014-12-16 10:59:27 +00:00
Duncan P. N. Exon Smith bb7d2fb1e5 IR: Stop printing 'metadata' in Metadata::print()
Stop printing `metadata` in `Metadata::print()` and
`Metadata::printAsOperand()`.

llvm-svn: 224327
2014-12-16 07:40:31 +00:00
Duncan P. N. Exon Smith 68c37245d3 DebugInfo: Update testcase to actually check something
This test was missing a `Debug Info Version` so it's `not grep` was
passing vacuously.  Update it to CHECK for something useful at the same
time so it doesn't bitrot quite so easily in the future.

llvm-svn: 224324
2014-12-16 07:08:19 +00:00
Saleem Abdulrasool 417fc6b303 ARM: diagnose deprecated syntax
The use of SP and PC in the register list for stores is deprecated on ARM
(ARM ARM A.8.8.199):

  ARM deprecates the use of ARM instructions that include the SP or the PC in
  the list.

Provide a deprecation warning from the assembler in the case that the syntax is
ever seen.

llvm-svn: 224319
2014-12-16 05:53:25 +00:00
Hal Finkel 8adf2254ef [PowerPC] Improve instruction selection bit-permuting operations (32-bit)
The PowerPC backend, somewhat embarrassingly, did not generate an
optimal-length sequence of instructions for a 32-bit bswap. While adding a
pattern for the bswap intrinsic to fix this would not have been terribly
difficult, doing so would not have addressed the real problem: we had been
generating poor code for many bit-permuting operations (by which I mean things
like byte-swap that permute the bits of one or more inputs around in various
ways). Here are some initial steps toward solving this deficiency.

Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL,
SHL, SRL, AND and OR (mostly with constant second operands). Looking back
through these operations, we can build up a description of the bits in the
resulting value in terms of bits of one or more input values (and constant
zeros). For each bit, we compute the rotation amount from the original value,
and then group consecutive (value, rotation factor) bits into groups. Groups
sharing these attributes are then collected and sorted, and we can then
instruction select the entire permutation using a combination of masked
rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts
(rlwimi).

The result is that instead of lowering an i32 bswap as:

	rlwinm 5, 3, 24, 16, 23
	rlwinm 4, 3, 24, 0, 7
	rlwimi 4, 3, 8, 8, 15
	rlwimi 5, 3, 8, 24, 31
	rlwimi 4, 5, 0, 16, 31

we now produce:

	rlwinm 4, 3, 8, 0, 31
	rlwimi 4, 3, 24, 16, 23
	rlwimi 4, 3, 24, 0, 7

and for the 'test6' example in the PowerPC/README.txt file:

 unsigned test6(unsigned x) {
   return ((x & 0x00FF0000) >> 16) | ((x & 0x000000FF) << 16);
 }

we used to produce:

	lis 4, 255
	rlwinm 3, 3, 16, 0, 31
	ori 4, 4, 255
	and 3, 3, 4

and now we produce:

	rlwinm 4, 3, 16, 24, 31
	rlwimi 4, 3, 16, 8, 15

and, as a nice bonus, this fixes the FIXME in
test/CodeGen/PowerPC/rlwimi-and.ll.

This commit does not include instruction-selection for i64 operations, those
will come later.

llvm-svn: 224318
2014-12-16 05:51:41 +00:00
Rafael Espindola 9d1020648c Start adding thin archive support.
This is just sufficient for 'ar t' to work.

llvm-svn: 224307
2014-12-16 01:43:41 +00:00
Kevin Enderby c971338def Fix a bug in llvm-objdump’s -private-headers for 32-bit Mach-O files
printing the section header.  And add some tests for this for 32-bit files.

llvm-svn: 224302
2014-12-16 01:14:45 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Colin LeMahieu d9a00a9c38 [Hexagon] Adding doubleword multiplies with and without accumulation.
llvm-svn: 224293
2014-12-16 00:07:24 +00:00
Colin LeMahieu 18c927620a [Hexagon] Adding halfword to doubleword multiplies.
llvm-svn: 224289
2014-12-15 23:29:37 +00:00
Colin LeMahieu 64ffd52943 [Hexagon] Adding logical-logical accumulation instructions and tests.
llvm-svn: 224288
2014-12-15 23:19:07 +00:00
Sanjoy Das 4555b6d870 Teach ScalarEvolution to exploit min and max expressions when proving
isKnownPredicate.

The motivation for this change is to optimize away checks in loops
like this:

    limit = min(t, len)
    for (i = 0 to limit)
      if (i >= len || i < 0) throw_array_of_of_bounds();
      a[i] = ...

Differential Revision: http://reviews.llvm.org/D6635

llvm-svn: 224285
2014-12-15 22:50:15 +00:00
Simon Pilgrim dad5686881 Added missing tests for X86vzmovl folding. NFC.
llvm-svn: 224284
2014-12-15 22:45:48 +00:00
JF Bastien 388b8794c9 x86: Emit LOCK prefix after DATA16
Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact.

Test Plan: ninja check

Reviewers: craig.topper, jvoung

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6630

llvm-svn: 224283
2014-12-15 22:34:58 +00:00
Colin LeMahieu 71e11a1d0d [Hexagon] Adding a number of additional multiply forms with tests.
llvm-svn: 224282
2014-12-15 22:10:37 +00:00
Colin LeMahieu 4a46429305 [Hexagon] Adding misc multiply encodings and tests.
llvm-svn: 224273
2014-12-15 21:17:03 +00:00
Colin LeMahieu 26f884aedf [Hexagon] Adding doubleworld accumulating multiplies of halfwords.
llvm-svn: 224267
2014-12-15 20:17:46 +00:00
Colin LeMahieu 572c53e258 [Hexagon] Adding accumulating half word multiplies.
llvm-svn: 224266
2014-12-15 20:10:28 +00:00
Colin LeMahieu d1704cdc07 [Hexagon] Adding multiply with rnd/sat/rndsat
llvm-svn: 224265
2014-12-15 20:01:59 +00:00
Ahmed Bougacha b9d92194dc [X86] And also test INSERTPS shuffle mask pretty-printing.
For r224260.

llvm-svn: 224264
2014-12-15 19:47:35 +00:00
Colin LeMahieu fe4012a969 [Hexagon] Adding encoding bits for halfword multiplies.
llvm-svn: 224261
2014-12-15 19:22:07 +00:00
Duncan P. N. Exon Smith be7ea19b58 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Reid Kleckner b736bf3899 Move mips1 tests to test/MC/Disassembler/Mips/mips1
This matches the pattern of the mips2 and 3 tests, as well as our normal
conventions.

llvm-svn: 224254
2014-12-15 17:56:02 +00:00
Vladimir Medic d7ecf49e97 Add disassembler tests for mips3 platform. There are no functional changes.
llvm-svn: 224253
2014-12-15 16:19:34 +00:00
Vladimir Medic 19703a0bd6 Add disassembler tests for mips2 platform. There are no functional changes.
llvm-svn: 224252
2014-12-15 15:58:20 +00:00
Vladimir Medic a67937331d This is the first in a series of patches that add missing disassembler tests for mips platform. The patches are divided per version of mips CPU to keep the patches smaller and ease the review. There are no functional changes, code is changed only if new tests reveal a bug.This patch adds disassembler tests for mips1 CPU.
llvm-svn: 224251
2014-12-15 15:22:33 +00:00
Elena Demikhovsky bf74736290 Added a test related to 224247 revision
llvm-svn: 224248
2014-12-15 14:14:10 +00:00
Michael Kuperstein 47c97157ef [X86] Break false dependencies before partial register updates when the source operand is in memory
Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list.

Differential Revision: http://reviews.llvm.org/D6620

llvm-svn: 224246
2014-12-15 13:18:21 +00:00
Suyog Sarda 2b27fc78a2 Typo Correction in Test Case. NFC.
llvm-svn: 224244
2014-12-15 12:19:46 +00:00
Elena Demikhovsky 72860c341e AVX-512: Added EXPAND instructions and intrinsics.
llvm-svn: 224241
2014-12-15 10:03:52 +00:00
Hal Finkel 4104a1a346 [PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc
PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted
truncations and extensions when dealing with nodes of the form:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)

There was a FIXME in the implementation (now removed) regarding the fact that
the function would abort the transformations if any of the non-output operands
of a SELECT or SELECT_CC node would need to be promoted (because they were
also output operands, for example). As a result, we continued to generate
unnecessary zero-extends for code such as this:

  unsigned foo(unsigned a, unsigned b) {
    return  (a <= b) ? a : b;
  }

which would produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  rldicl 3, 3, 0, 32
  blr

and now we produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  blr

which is better in the obvious way.

llvm-svn: 224213
2014-12-14 05:53:19 +00:00
Ahmed Bougacha 0cb861634b Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 224203
2014-12-13 23:22:12 +00:00
Renato Golin df8f9b6dc9 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

llvm-svn: 224198
2014-12-13 20:23:18 +00:00
Akira Hatanaka 7ba78302b5 Rename argument strings of codegen passes to avoid collisions with command line
options.

This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.

This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.

#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));

rdar://problem/19212448

llvm-svn: 224186
2014-12-13 04:52:04 +00:00
Hal Finkel 4c6658feb0 [PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts
On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all
of the usual places, but also from the ABI, which specifies that values passed
are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined
to do *something* to the higher-order bits, and for some instructions, that
action clears those bits (thus providing a zero-extended result). This is
especially common after rotate-and-mask instructions. Adding an additional
instruction to zero-extend the results of these instructions is unnecessary.

This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and
looks back through their operands to see if all instructions will implicitly
zero extend their results. If so, we convert these instructions to their 64-bit
variants (which is an internal change only, the actual encoding of these
instructions is the same as the original 32-bit ones) and remove the
unnecessary zero-extension (changing where the INSERT_SUBREG instructions are
to make everything internally consistent).

llvm-svn: 224169
2014-12-12 23:59:36 +00:00
David Majnemer 9b6097586c ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume
Respect the MaxDepth recursion limit, doing otherwise will trigger an
assert in computeKnownBits.

This fixes PR21891.

llvm-svn: 224168
2014-12-12 23:59:29 +00:00
Chad Rosier 620fb2206d [ARMConstantIsland] Insert tbb/tbh optimization where previous jump table resided.
llvm-svn: 224165
2014-12-12 23:27:40 +00:00
Colin LeMahieu 90482a77b1 [Hexagon] Adding double word add/min/minu/max/maxu instructions and tests.
llvm-svn: 224153
2014-12-12 21:29:25 +00:00
Nico Weber cdab5b6935 Revert r224149, llvm-dsymutil was already here.
I saw a failure on an internal bot, opened this file, saw it was missing,
thought "aha!", tried to land, got an "file is out of date", synced, didn't see
the file listed right above the line I added (cause I didn't add it in the
right place) and landed. Apologies!

llvm-svn: 224152
2014-12-12 21:25:07 +00:00
Colin LeMahieu 984ef17d66 [Hexagon] Adding J class call instructions.
llvm-svn: 224150
2014-12-12 21:12:27 +00:00
Nico Weber e52fd5b3e5 Add llvm-dsymutil to test/CMakeLists.txt
r224134 added this and runs it from a test, but doesn't build it with test
binaries.

llvm-svn: 224149
2014-12-12 20:56:49 +00:00
Reid Kleckner 5135227549 Relax debug-map-parsing.test error message check for Windows
On Windows we get the string "no such file or directory".

llvm-svn: 224141
2014-12-12 18:52:07 +00:00
Frederic Riss 231f714e54 Initial dsymutil tool commit.
The goal of this tool is to replicate Darwin's dsymutil functionality
based on LLVM. dsymutil is a DWARF linker. Darwin's linker (ld64) does
not link the debug information, it leaves it in the object files in
relocatable form, but embbeds a `debug map` into the executable that
describes where to find the debug information and how to relocate it.
When releasing/archiving a binary, dsymutil is called to link all the DWARF
information into a `dsym bundle` that can distributed/stored along with
the binary.

With this commit, the LLVM based dsymutil is just able to parse the STABS
debug maps embedded by ld64 in linked binaries (and not all of them, for
example archives aren't supported yet).

Note that the tool directory is called dsymutil, but the executable is
currently called llvm-dsymutil. This discrepancy will disappear once the
tool will be feature complete. At this point the executable will be renamed
to dsymutil, but until then you do not want it to override the system one.

    Differential Revision: http://reviews.llvm.org/D6242

llvm-svn: 224134
2014-12-12 17:31:24 +00:00
Robert Khasanov 37c3ad6c20 [AVX512] Enabling bit logic lowering
Added lowering tests.

llvm-svn: 224132
2014-12-12 17:02:18 +00:00
Vasileios Kalintiris 8edbcad8e5 [mips] Enable code generation for MIPS-III.
Summary:
This commit enables the MIPS-III target and adds support for code
generation of SELECT nodes. We have to use pseudo-instructions with
custom inserters for these nodes as MIPS-III CPUs do not have
conditional-move instructions.

Depends on D6212

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6464

llvm-svn: 224128
2014-12-12 15:16:46 +00:00
Robert Khasanov e82a3630b7 [AVX512] Enabling MIN/MAX lowering.
Added lowering tests.

llvm-svn: 224127
2014-12-12 15:10:43 +00:00
Andrea Di Biagio d65fd9facd Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g."
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.

llvm-svn: 224126
2014-12-12 15:09:58 +00:00
Vasileios Kalintiris f53f785a6e [mips] Support SELECT nodes for targets that don't have conditional-move instructions.
Summary:
For Mips targets that do not have conditional-move instructions, ie. targets
before MIPS32 and MIPS-IV, we have to insert a diamond control-flow
pattern in order to support SELECT nodes. In order to do that, we add
pseudo-instructions with a custom inserter that emits the necessary
control-flow that selects the correct value.

With this patch we add complete support for code generation of Mips-II targets
based on the LLVM test-suite.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6212

llvm-svn: 224124
2014-12-12 14:41:37 +00:00
Andrea Di Biagio 5634a54efc Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
Test 'misched-code-difference-with-debug.ll' was failing on some buildbots.

llvm-svn: 224121
2014-12-12 13:34:03 +00:00
Suyog Sarda 384095e65c This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it. 
 
 Test case :
 
       float hadd(float* a) {
           return (a[0] + a[1]) + (a[2] + a[3]);
        }
 
 
 AArch64 assembly before patch :
 
        ldp	s0, s1, [x0]
 	ldp	s2, s3, [x0, #8]
 	fadd	s0, s0, s1
 	fadd	s1, s2, s3
 	fadd	s0, s0, s1
 	ret
 
 AArch64 assembly after patch :
 
        ldp	d0, d1, [x0]
 	fadd	v0.2s, v0.2s, v1.2s
 	faddp	s0, v0.2s
 	ret

Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html

llvm-svn: 224119
2014-12-12 12:53:44 +00:00
Andrea Di Biagio 01236e3eca [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.

The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.

This patch fixes the issue in MISched and provides a test.

Patch by Russell Gallop!

llvm-svn: 224118
2014-12-12 12:41:22 +00:00
Charlie Turner 1a53996c31 Emit Tag_ABI_FP_16bit_format build attribute.
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
supported, there is not a user switch to change this behaviour. This build
attribute should capture the default behaviour of the compiler, which is to
expose the IEEE 754 version of __fp16.

When -mfp16-format is emitted, that will be the way to control the value of
this build attribute.

Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0
llvm-svn: 224115
2014-12-12 11:59:18 +00:00
Ekaterina Romanova 90ff20d8f5 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. 
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. 
Added DW_OP_stack_value to the stack. 

Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) 
is being investigated. 

llvm-svn: 224098
2014-12-12 05:11:47 +00:00
Steven Wu 881916dea5 Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

llvm-svn: 224097
2014-12-12 04:34:07 +00:00
Matt Arsenault 1e3a4ebc6e R600: Fix min/max matching problems with unordered compares
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.

llvm-svn: 224094
2014-12-12 02:30:37 +00:00
Matt Arsenault 477b178276 R600/SI: Don't promote f32 select to i32
This is nice for the instruction patterns, but it complicates
min / max matching. The select doesn't have the correct type and would
require looking through the bitcasts for the real float operands.

llvm-svn: 224092
2014-12-12 02:30:29 +00:00
Matt Arsenault 810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith eca1e031d1 Bitcode: Use unsigned char to record MDStrings
`MDString`s can have arbitrary characters in them.  Prevent an assertion
that fired in `BitcodeWriter` because of sign extension by copying the
characters into the record as `unsigned char`s.

Based on a patch by Keno Fischer; fixes PR21882.

llvm-svn: 224077
2014-12-11 23:34:30 +00:00
Ahmed Bougacha 79c797443b [X86] Add a temporary testcase for PR21876/r223996.
llvm-svn: 224074
2014-12-11 23:07:52 +00:00
Duncan P. N. Exon Smith 5c7006e062 Bitcode: Add METADATA_NODE and METADATA_VALUE
This reflects the typelessness of `Metadata` in the bitcode format,
removing types from all metadata operands.

`METADATA_VALUE` represents a `ValueAsMetadata`, and always has two
fields: the type and the value.

`METADATA_NODE` represents an `MDNode`, and unlike `METADATA_OLD_NODE`,
doesn't store types.  It stores operands at their ID+1 so that `0` can
reference `nullptr` operands.

Part of PR21532.

llvm-svn: 224073
2014-12-11 23:02:24 +00:00
Hal Finkel b5e9b0426a [PowerPC] Better lowering for add/or of a FrameIndex
If we have an add (or an or that is really an add), where one operand is a
FrameIndex and the other operand is a small constant, we can combine the
lowering of the FrameIndex (which is lowered as an add of the FI and a zero
offset) with the constant operand.

Amusingly, this is an old potential improvement entry from
lib/Target/PowerPC/README.txt which had never been resolved. In short, we used
to lower:

        %X = alloca { i32, i32 }
        %Y = getelementptr {i32,i32}* %X, i32 0, i32 1
        ret i32* %Y

as:

        addi 3, 1, -8
        ori 3, 3, 4
        blr

and now we produce:

        addi 3, 1, -4
        blr

which is much more sensible.

llvm-svn: 224071
2014-12-11 22:51:06 +00:00
Matt Arsenault 58d502f0d4 R600/SI: Use unordered equal instructions
llvm-svn: 224067
2014-12-11 22:15:43 +00:00
Matt Arsenault 8b989efaf9 R600/SI: Make more unordered comparisons legal
This saves a second compare and an and / or by using
the unordered comparison instructions.

llvm-svn: 224066
2014-12-11 22:15:39 +00:00
Matt Arsenault 9cded7a74b R600/SI: Use unordered not equal instructions
llvm-svn: 224065
2014-12-11 22:15:35 +00:00
Alexey Samsonov 4b7f413e3e [ASan] Change fake stack and local variables handling.
This commit changes the way we get fake stack from ASan runtime
(to find use-after-return errors) and the way we represent local
variables:
  - __asan_stack_malloc function now returns pointer to newly allocated
    fake stack frame, or NULL if frame cannot be allocated. It doesn't
    take pointer to real stack as an input argument, it is calculated
    inside the runtime.
  - __asan_stack_free function doesn't take pointer to real stack as
    an input argument. Now this function is never called if fake stack
    frame wasn't allocated.
  - __asan_init version is bumped to reflect changes in the ABI.
  - new flag "-asan-stack-dynamic-alloca" allows to store all the
    function local variables in a dynamic alloca, instead of the static
    one. It reduces the stack space usage in use-after-return mode
    (dynamic alloca will not be called if the local variables are stored
    in a fake stack), and improves the debug info quality for local
    variables (they will not be described relatively to %rbp/%rsp, which
    are assumed to be clobbered by function calls). This flag is turned
    off by default for now, but I plan to turn it on after more
    testing.

llvm-svn: 224062
2014-12-11 21:53:03 +00:00
David Majnemer 0a14c0ec9d AsmParser: Don't crash on an ill-formed MDNodeVector
llvm-svn: 224056
2014-12-11 20:51:54 +00:00
Andrea Di Biagio 72b05aa59c [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
David Majnemer 06f960d5d3 AsmParser: Don't crash on an ill-formed MDNodeVector
llvm-svn: 224053
2014-12-11 20:44:09 +00:00
Hal Finkel 13d104bf78 [PowerPC] Implement BuildSDIVPow2, lower i64 pow2 sdiv using sradi
PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using
srawi/addze, but did not implement the i64 case. DAGCombine now contains a
callback specifically designed for this purpose (BuildSDIVPow2), and part of
the logic has been moved to an implementation of that callback. Doing this
lowering using BuildSDIVPow2 likely does not matter, compared to handling
everything in PPCISelDAGToDAG, for the positive divisor case, but the negative
divisor case, which generates an additional negation, can potentially benefit
from additional folding from DAGCombine. Now, both the i32 and the i64 cases
have been implemented.

Fixes PR20732.

llvm-svn: 224033
2014-12-11 18:37:52 +00:00
Cameron McInally 5fb084e798 [AVX512] Add support for 512b variable bit shift intrinsics.
llvm-svn: 224028
2014-12-11 17:13:05 +00:00
Colin LeMahieu eb52f69f59 [Hexagon] Adding encoding information for sign extend word instruction.
llvm-svn: 224026
2014-12-11 16:43:06 +00:00
Elena Demikhovsky 908dbf48c8 AVX-512: Added all forms of COMPRESS instruction
+ intrinsics + tests

llvm-svn: 224019
2014-12-11 15:02:24 +00:00
Jozef Kolek a330a47427 [mips][microMIPS] Implement CodeGen support for LI16 instruction.
Differential Revision: http://reviews.llvm.org/D5840

llvm-svn: 224017
2014-12-11 13:56:23 +00:00
David Majnemer f532fcb889 InstSimplify: Remove usesless %a parameter from tests
No functional change intended.

llvm-svn: 224016
2014-12-11 12:56:17 +00:00
Michael Kuperstein fffb6996c9 The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value.
Patch by Amjad Aboud

Differential Revision: http://reviews.llvm.org/D6525

llvm-svn: 224015
2014-12-11 12:41:10 +00:00
Michael Kuperstein a1b1922827 Add newline missing in r224010.
llvm-svn: 224011
2014-12-11 11:30:20 +00:00
Michael Kuperstein 11165674dc [X86] When converting movs to pushes, don't assume MOVmi operand is an actual immediate
This should fix PR21878.

llvm-svn: 224010
2014-12-11 11:26:16 +00:00
Elena Demikhovsky fc081457f1 AVX-512: Fixed a bug in lowering setcc for MVT::i1 type
llvm-svn: 224008
2014-12-11 10:21:12 +00:00
Duncan P. N. Exon Smith 6ec9edf8ee IR: Canonicalize metadata formatting, NFC
Canonicalize formatting of metadata to make it easier to upgrade via
scripts -- in particular, one line per metadata definition makes it more
`sed`-able.

This is preparation for changing the assembly syntax for metadata [1].

[1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248449.html

llvm-svn: 224002
2014-12-11 06:32:29 +00:00
Ekaterina Romanova 75fd123967 Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64.
The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change.  

llvm-svn: 224000
2014-12-11 06:22:35 +00:00
Nick Lewycky dde76ff09c Fix LLVMContext to match what MDKind names that the LL parser permits. Fixes PR21799!
llvm-svn: 223995
2014-12-11 02:10:28 +00:00
Duncan P. N. Exon Smith f910e8a04d IR: Add 'invalid-' to test names for invalid assembly
Take the opportunity to sort these by `metadata`.

llvm-svn: 223993
2014-12-11 01:34:46 +00:00
Tim Northover 2ac7e4b3ee ARM: correctly expand LDR-lit based globals.
Quite a major error here: the expansions for the Pseudos with and without
folded load were mixed up. Fortunately it only affects ARM-mode, when not using
movw/movt, on Darwin. I'm guessing no-one actually uses that combination.

llvm-svn: 223986
2014-12-10 23:40:50 +00:00
Ekaterina Romanova ceeaba7932 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.

Added DW_OP_stack_value to the stack.

-This line, and those below, will be ignored--

M    lib/CodeGen/AsmPrinter/DwarfDebug.cpp
A    test/DebugInfo/incorrect-variable-debugloc1.ll

llvm-svn: 223981
2014-12-10 23:19:56 +00:00
Colin LeMahieu 220adb6370 [Hexagon] Adding combine ri/ir instructions.
llvm-svn: 223971
2014-12-10 22:23:07 +00:00
David Majnemer 5a7717e498 ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0
Zero is usually a nicer constant to have than -1.

llvm-svn: 223969
2014-12-10 21:58:15 +00:00
David Majnemer 89cf6d79eb ConstantFold: an undef shift amount results in undef
X shifted by undef results in undef because the undef value can
represent values greater than the width of the operands.

llvm-svn: 223968
2014-12-10 21:38:05 +00:00
Colin LeMahieu db0b13cef0 [Hexagon] Adding encodings for JR class instructions. Updating complier usages.
llvm-svn: 223967
2014-12-10 21:24:10 +00:00
Juergen Ributzka 2326650ceb [AArch64] MachO large code-model: Materialize FP constants in code.
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.

To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.

Reviewed by Tim Northover

Fixes rdar://problem/16572564.

llvm-svn: 223941
2014-12-10 19:43:32 +00:00
Colin LeMahieu 8872d20788 [Hexagon] Adding JR class predicated call reg instructions.
llvm-svn: 223933
2014-12-10 18:24:16 +00:00
Sanjay Patel e20437f9af Match new shuffle codegen for MOVHPD patterns
Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen
when storing the high element of a v2f64. The existing patterns were
only checking for an unpckh type of shuffle. 

http://llvm.org/bugs/show_bug.cgi?id=21791

Differential Revision: http://reviews.llvm.org/D6586

llvm-svn: 223929
2014-12-10 16:58:54 +00:00
David Majnemer 7b86b77248 ConstantFold: div undef, 0 should fold to undef, not zero
Dividing by zero yields an undefined value.

llvm-svn: 223924
2014-12-10 09:14:55 +00:00
David Majnemer ae707582c0 InstSimplify: [al]shr exact undef, %X -> undef
Exact shifts always keep the non-zero bits of their input.  This means
it keeps it's undef bits.

llvm-svn: 223923
2014-12-10 09:14:52 +00:00
Michael Kuperstein 0104ff6529 [X86] Make a code path in EltsFromConsecutiveLoads work only on vectors it expects
EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash.
This adds a check to make this path fire only for 128-bit vectors.

Differential Revision: http://reviews.llvm.org/D6579

llvm-svn: 223922
2014-12-10 08:46:12 +00:00
David Majnemer 71dc8fb867 InstSimplify: div %X, 0 -> undef
We already optimized rem %X, 0 to undef, we should do the same for div.

llvm-svn: 223919
2014-12-10 07:52:18 +00:00
David Majnemer 612f31284e DataLayout: Provide nicer diagnostics for malformed strings
llvm-svn: 223911
2014-12-10 02:36:41 +00:00
David Majnemer 6f3be2e155 AsmParser: Don't allow null bytes in BB labels
Since Value objects can't have null bytes in their name, we shouldn't
allow them in the labels of basic blocks.

llvm-svn: 223907
2014-12-10 02:10:35 +00:00
David Majnemer 2dc1b0f514 DataLayout: Be more verbose when diagnosing problems in pointer specs
llvm-svn: 223903
2014-12-10 01:38:28 +00:00
David Majnemer 84f798f0eb I didn't intend to commit this with r223898
llvm-svn: 223899
2014-12-10 01:17:48 +00:00
David Majnemer 5330c69bd1 DataLayout: Move asserts over to report_fatal_error
As indicated by the tests, it is possible to feed the AsmParser an
invalid datalayout string.  We should verify the result of parsing this
string regardless of whether or not we have assertions enabled.

llvm-svn: 223898
2014-12-10 01:17:08 +00:00
David Majnemer 1d681aa0ba AsmParser: Don't crash if a null byte is inside a quoted string
We don't allow Value* to have names which contain null bytes.  The
AsmParser should reject .ll files that try to do this.

llvm-svn: 223869
2014-12-10 00:43:17 +00:00
Ahmed Bougacha 7efbac74ec [ARM] Combine base-updating/post-incrementing vector load/stores.
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 223862
2014-12-10 00:07:37 +00:00
David Majnemer 4cea5b4954 Forgot to add test for r223856
llvm-svn: 223857
2014-12-09 23:51:14 +00:00
Ahmed Bougacha 9d2d7c1b00 [ARM] Make testcase more explicit. NFC.
llvm-svn: 223841
2014-12-09 22:08:57 +00:00
Ahmed Bougacha be0b227679 [ARM] Also support v2f64 vld1/vst1.
It was missing from the VLD1/VST1 handling logic, even though the
corresponding instructions exist (same form as v2i64).

In preparation for a future patch.

llvm-svn: 223832
2014-12-09 21:25:00 +00:00
Colin LeMahieu b030c254c0 [Hexagon] Fixing broken tests.
llvm-svn: 223823
2014-12-09 20:36:53 +00:00
Colin LeMahieu 4af437fee5 [Hexagon] Updating rr/ri 32/64 transfer encodings and adding tests.
llvm-svn: 223821
2014-12-09 20:23:30 +00:00
Juergen Ributzka c6f314b8ed [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.
The load/store value type is currently not available when lowering the memcpy
intrinsic. Add the missing nullptr check to support this in 'computeAddress'.

Fixes rdar://problem/19178947.

llvm-svn: 223818
2014-12-09 19:44:38 +00:00
Colin LeMahieu b580d7d8c8 [Hexagon] Adding word combine dot-new form and replacing old combine opcode.
llvm-svn: 223815
2014-12-09 19:23:45 +00:00
Chandler Carruth a7f247ea56 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

llvm-svn: 223813
2014-12-09 19:21:16 +00:00
David Majnemer 2defbada38 AsmParser: Don't crash on short hex constants for fp128 types
If we see 0xL01, treat it like 0xL00000000000000000000000000000001
instead of crashing.

llvm-svn: 223811
2014-12-09 19:10:03 +00:00
Robert Khasanov 8e8c39963d [AVX512] Added lowering for VBROADCASTSS/SD instructions.
Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
Added lowering tests.

llvm-svn: 223804
2014-12-09 18:45:30 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
David Majnemer b39e22bdc5 AsmParser: Don't crash on malformed attribute groups
This fixes PR21785.

llvm-svn: 223801
2014-12-09 18:33:57 +00:00
Colin LeMahieu 30dcb232b0 [Hexagon] Updating predicate register transfers and adding tstbit to allow select selection. Updating ll tests with predicate transfers that previously had nop encodings.
llvm-svn: 223800
2014-12-09 18:16:49 +00:00
Frederic Riss 7c78db5065 Correctly handle complex locations expressions in replaceDbgDeclareForAlloca()
replaceDbgDeclareForAlloca() replaces an alloca by a value storing the
address of what was the alloca. If there is a dbg.declare corresponding
to that alloca, we need to lower it to a dbg.value describing the additional
dereference operation to be performed to get to the underlying variable.
 This is done by adding a DW_OP_deref to the complex location part of the
location description. This deref was added to the end of the operation list,
which is wrong. The expression applies to what is described by the
dbg.{declare,value}, and as we are changing this, we need to apply the
DW_OP_deref as the first operation in the list.

Part of the fix for rdar://19162268.

llvm-svn: 223799
2014-12-09 17:55:48 +00:00
Frederic Riss 04aef05537 Revert "Initial dsymutil tool commit."
This reverts commit r223793. The review thread wasn't concluded.

llvm-svn: 223794
2014-12-09 17:21:50 +00:00
Frederic Riss 893c4f1e4d Initial dsymutil tool commit.
The goal of this tool is to replicate Darwin's dsymutil functionality
based on LLVM. dsymutil is a DWARF linker. Darwin's linker (ld64) does
not link the debug information, it leaves it in the object files in
relocatable form, but embbeds a `debug map` into the executable that
describes where to find the debug information and how to relocate it.
When releasing/archiving a binary, dsymutil is called to link all the DWARF
information into a `dsym bundle` that can distributed/stored along with
the binary.

With this commit, the LLVM based dsymutil is just able to parse the STABS
debug maps embedded by ld64 in linked binaries (and not all of them, for
example archives aren't supported yet).

Note that the tool directory is called dsymutil, but the executable is
currently called llvm-dsymutil. This discrepancy will disappear once the
tool will be feature complete. At this point the executable will be renamed
to dsymutil, but until then you do not want it to override the system one.

    Differential Revision: http://reviews.llvm.org/D6242

llvm-svn: 223793
2014-12-09 17:03:30 +00:00
Bill Schmidt efe9ce216e [PowerPC 4/4] Enable little-endian support for VSX.
With the foregoing three patches, VSX instructions can be used for
little endian.  This patch removes the restriction that prevented
this, and re-enables the test cases from the first three patches.

llvm-svn: 223792
2014-12-09 16:59:57 +00:00
Bill Schmidt 3014435ca9 [PowerPC 3/4] Little-endian adjustments for VSX vector shuffle
When performing instruction selection for ISD::VECTOR_SHUFFLE, there
is special code for handling v2f64 and v2i64 using VSX instructions.
This code must be adjusted for little-endian.  Because the two inputs
are treated as a double-wide register, we must swap their order for
little endian.  To get the appropriate mask elements to use with the
big-endian biased XXPERMDI instruction, we must reverse their order
and invert the bits.

A new test is added to test the 16 possible values of the shuffle
mask.  It is initially disabled for reasons specified in the test.  It
is re-enabled by patch 4/4.

llvm-svn: 223791
2014-12-09 16:52:29 +00:00
Rafael Espindola a4f104b2ce Remember the unmangled name in the plugin.
This allows it to work with non trivial manglings like the one in COFF.

Amusingly, this can be tested with gold, as emit-llvm causes the plugin to
exit before any COFF is generated.

llvm-svn: 223790
2014-12-09 16:50:57 +00:00
Bill Schmidt 4187962697 Add test cases that were inadvertently omitted from r223783 and r223788
llvm-svn: 223789
2014-12-09 16:44:58 +00:00
Robert Khasanov cbc5703aeb [AVX512] Added VPBROADCAST{BWDQ} (Load with Broadcast Integer Data from General Purpose Register) encodings for AVX512-BW/VL subsets
Added encoding tests.
        

llvm-svn: 223787
2014-12-09 16:38:41 +00:00
Juergen Ributzka c1bbcbbd32 [CodeGenPrepare] Split branch conditions into multiple conditional branches.
This optimization transforms code like:
bb1:
  %0 = icmp ne i32 %a, 0
  %1 = icmp ne i32 %b, 0
  %or.cond = or i1 %0, %1
  br i1 %or.cond, label %TrueBB, label %FalseBB

into a multiple branch instructions like:

bb1:
  %0 = icmp ne i32 %a, 0
  br i1 %0, label %TrueBB, label %bb2
bb2:
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %TrueBB, label %FalseBB

This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.

Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.

Reviewed by Jim Grosbach

This fixes rdar://problem/19034919.

llvm-svn: 223786
2014-12-09 16:36:13 +00:00
Bill Schmidt fae5d71584 [PowerPC 1/4] Little-endian adjustments for VSX loads/stores
This patch addresses the inherent big-endian bias in the lxvd2x,
lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
vector elements into registers left-to-right (with the first element
loaded into the high-order bits of the register), regardless of the
endian setting of the processor.  However, these are the only
vector memory instructions that permit unaligned storage accesses, so
we want to use them for little-endian.

To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
followed by an xxswapd, which swaps the doublewords.  This works for
lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
vector elements are in LE order (right-to-left) within each
doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
with an stxvd2x preceded by an xxswapd.

Introduction of extra swap instructions provides correctness, but
obviously is not ideal from a performance perspective.  Future patches
will address this with optimizations to remove most of the introduced
swaps, which have proven effective in other implementations.

The introduction of the swaps is performed during lowering of LOAD,
STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
are used to translate intrinsics that specify the VSX loads and stores
directly into equivalent sequences for little endian.  Thus code that
uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
ported from BE to LE.

We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
use during this lowering step.  In PPCInstrVSX.td, we add new SDType
and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
These are recognized during instruction selection and mapped to the
correct instructions.

Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
to disable VSX on LE variants because code generation changes with
this and subsequent patches in this set.  I chose to include all of
these in the first patch than try to rigorously sort out which tests
were broken by one or another of the patches.  Sorry about that.

The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
are disabled until LE support is enabled because of breakages that
occur as noted in those tests.  They are re-enabled in patch 4/4.

llvm-svn: 223783
2014-12-09 16:35:51 +00:00
Chandler Carruth f57ac3bd22 [x86] Fix the test to actually test things for the CPU names, add the
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.

If anyone wants to clean up the 32-bit only x86 CPUs, go for it.

Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.

llvm-svn: 223774
2014-12-09 14:25:55 +00:00
Asiri Rathnayake 7835e9b232 Fix modified immediate bug reported by MC Hammer.
Instructions of the form [ADD Rd, pc, #imm] are manually aliased
in processInstruction() to use ADR. To accomodate this, mod_imm handling
had to be tweaked a bit. Turns out it was the manual aliasing that must
be tweaked to accommodate mod_imms instead. More information about the
parsed instruction is available at the point where processInstruction()
is invoked, which makes it easier to detect a mod_imm at that point rather
than trying to detect a potential alias when a mod_imm is being prepped.
Added a test case and fixed some white spaces as well.

llvm-svn: 223772
2014-12-09 13:14:58 +00:00
Chandler Carruth 5303c6fc6c [x86] Add a test for the CPU names that should have been in r223769.
llvm-svn: 223770
2014-12-09 11:19:57 +00:00
Sonam Kumari 72ccc3c428 Removal Of Duplicate Test Cases and Addition Of Missing Check Statements
llvm-svn: 223768
2014-12-09 10:46:38 +00:00
Ankur Garg 51eeba70da [test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC.
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.

test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.

(Removed test57 and test58)

llvm-svn: 223767
2014-12-09 10:35:19 +00:00
Chandler Carruth 7415205113 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

llvm-svn: 223764
2014-12-09 08:55:32 +00:00
Michael Kuperstein c69bb43f35 [X86] Convert esp-relative movs of function arguments into pushes, step 1
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.

Differential Revision: http://reviews.llvm.org/D6503

llvm-svn: 223757
2014-12-09 06:10:44 +00:00
David Majnemer 598bd05bd7 Reland r223754
The commit is identical except a reference to `GV' should have been to
`GVal'.

llvm-svn: 223756
2014-12-09 05:56:09 +00:00
David Majnemer 8d3e580cc7 Revert "AsmParser: Reject invalid mismatch between forward ref and def"
This reverts commit r223754.  I've upset the buildbots.

llvm-svn: 223755
2014-12-09 05:50:11 +00:00
David Majnemer e9efecaa52 AsmParser: Reject invalid mismatch between forward ref and def
Don't assume that the forward referenced entity was of the same
global-kind as the new entity.

This fixes PR21779.

llvm-svn: 223754
2014-12-09 05:43:56 +00:00
Hal Finkel c8cf2b88bc Handle early-clobber registers in the aggressive anti-dep breaker
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.

Fixes PR21452.

llvm-svn: 223727
2014-12-09 01:00:59 +00:00
Tom Stellard 3e01d47d98 MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.

llvm-svn: 223717
2014-12-08 23:36:48 +00:00
Colin LeMahieu f5b4d655d2 [Hexagon] Adding any8, all8, and/or/xor/andn/orn/not predicate register forms, mask, and vitpack instructions and patterns.
llvm-svn: 223710
2014-12-08 23:07:59 +00:00
Hal Finkel aa10b3caaf [PowerPC] Don't use a non-allocatable register to implement the 'cc' alias
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.

There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.

llvm-svn: 223708
2014-12-08 22:54:22 +00:00
Colin LeMahieu df96b071f1 [Hexagon] Fixing broken test.
llvm-svn: 223704
2014-12-08 22:29:06 +00:00
Colin LeMahieu b6c4dd96f9 [Hexagon] Adding xtype doubleword add, sub, and, or, xor and patterns.
llvm-svn: 223702
2014-12-08 22:19:14 +00:00
Colin LeMahieu 9bfe5473da [Hexagon] Adding xtype doubleword comparisons. Removing unused multiclass.
llvm-svn: 223701
2014-12-08 21:56:47 +00:00
Colin LeMahieu 025f860638 [Hexagon] Adding xtype parity, min, minu, max, maxu instructions.
llvm-svn: 223693
2014-12-08 21:19:18 +00:00
Colin LeMahieu 8d1376c60e [Hexagon] Adding xtype halfword add/sub ll/hl/lh/hh/sat/<<16 instructions.
llvm-svn: 223692
2014-12-08 20:33:01 +00:00
David Majnemer 770fd82f39 ConstantFold: Zero-sized globals might land on top of another global
A zero sized array is zero sized and might share its address with
another global.

llvm-svn: 223684
2014-12-08 19:35:31 +00:00