Commit Graph

8521 Commits

Author SHA1 Message Date
Craig Topper bae0e9ea1d Make XOP and FMA4 require SSE4A to match GCC behavior. Use this to simplify Bulldozer feature list.
llvm-svn: 155897
2012-05-01 06:54:48 +00:00
Craig Topper d32ebcc36b Attempt to handle MRMInitReg in emitVEXOpcodePrefix. Hopefully fixes PR12711.
llvm-svn: 155896
2012-05-01 06:34:01 +00:00
Craig Topper 43518cc55f Make XOP imply AVX as its needed to legalize the registers types.
llvm-svn: 155891
2012-05-01 05:41:41 +00:00
Craig Topper c0cef32b83 Remove HasSSE2 from AES and CLMUL predicates. It's now implied by the HasAES and HasCLMUL predicates.
llvm-svn: 155890
2012-05-01 05:35:02 +00:00
Craig Topper 29dd148a71 Make CLMUL and AES imply SSE2 since its needed to legalize the type.
llvm-svn: 155888
2012-05-01 05:28:32 +00:00
Craig Topper 0eacda5f69 Enable AVX and FMA4 for AMD Bulldozer processors.
llvm-svn: 155885
2012-05-01 05:18:13 +00:00
Manman Ren 4f4d5c8fc8 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

llvm-svn: 155853
2012-04-30 22:51:25 +00:00
Chad Rosier d427d51c2b Tidy up. No functional change intended.
llvm-svn: 155832
2012-04-30 17:47:15 +00:00
Derek Schuff b051adf263 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

(this time, actually commit what was reviewed!)

llvm-svn: 155825
2012-04-30 16:57:15 +00:00
Craig Topper 55b3990837 No need to normalize index before calling Extract128BitVector
llvm-svn: 155811
2012-04-30 05:17:10 +00:00
Pete Cooper f76b5fe5ab Copied all the VEX prefix encoding code from X86MCCodeEmitter to the x86 JIT emitter. Needs some major refactoring as these two code emitters are almost identical
llvm-svn: 155810
2012-04-30 03:56:44 +00:00
Jakub Staszak da03f3ba64 Remove unneeded casts. No functionality change.
llvm-svn: 155800
2012-04-29 20:52:53 +00:00
Craig Topper 3b94fa63d6 Simplify code a bit. No functional change intended.
llvm-svn: 155798
2012-04-29 20:22:05 +00:00
Derek Schuff a99b168145 Revert r155745
llvm-svn: 155746
2012-04-27 23:37:41 +00:00
Derek Schuff bbf8b83e90 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

llvm-svn: 155745
2012-04-27 23:27:17 +00:00
Craig Topper 0fa6c7e593 Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements.
llvm-svn: 155742
2012-04-27 22:54:43 +00:00
Chad Rosier 32c2178ef3 Add x86-specific DAG combine to simplify:
x == -y --> x+y == 0
 x != -y --> x+y != 0

On x86, the generated code goes from
   negl    %esi
   cmpl    %esi, %edi
   je    .LBB0_2
to
   addl    %esi, %edi
   je    .L4

This case is correctly handled for ARM with "cmn".

Patch by Manman Ren.
rdar://11245199
PR12545

llvm-svn: 155739
2012-04-27 22:33:25 +00:00
Craig Topper 42cd8d2c00 Tidy up spacing.
llvm-svn: 155733
2012-04-27 21:05:09 +00:00
Benjamin Kramer 913da4b261 X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!

llvm-svn: 155704
2012-04-27 12:07:43 +00:00
Preston Gurd 81290f4be5 Trivial change to set UseLeaForSP flag in addition to toggling
the FeatureLeaForSP feature bit when llvm auto detects Intel Atom.

Patch by Andy Zhang

llvm-svn: 155655
2012-04-26 19:52:27 +00:00
Craig Topper 08ccfbe57b Enable detection of AVX and AVX2 support through CPUID. Add AVX/AVX2 to corei7-avx, core-avx-i, and core-avx2 cpu names.
llvm-svn: 155618
2012-04-26 06:40:15 +00:00
Craig Topper 5ff6dc34b9 Use vector_shuffles instead of target specific unpack nodes for AVX ZERO_EXTEND/ANY_EXTEND combine. These will be converted to target specific nodes during lowering. This is more consistent with other code.
llvm-svn: 155537
2012-04-25 06:39:39 +00:00
Nadav Rotem 810734b7f4 AVX: Add additional vbroadcast replacement sequences for integers.
Remove the v2f64 patterns because it does not match any vbroadcast
instruction.

llvm-svn: 155461
2012-04-24 18:09:59 +00:00
Nadav Rotem 7b7b99c74a AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8
immediate. We can't use it here because the shuffle code does not check that
the lower part of the word is identical to the upper part.

llvm-svn: 155440
2012-04-24 11:27:53 +00:00
Nadav Rotem aa3ff8da00 AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions
using the pattern (vbroadcast (i32load src)). In some cases, after we generate
this pattern new users are added to the load node, which prevent the selection
of the blend pattern. This commit provides fallback patterns which perform
in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1).

llvm-svn: 155437
2012-04-24 11:07:03 +00:00
Craig Topper 0b65c40821 Remove dangling spaces. Fix some other formatting.
llvm-svn: 155429
2012-04-24 06:36:35 +00:00
Craig Topper 6f2a535de2 Simplify code a bit and make it compile better. Remove unused parameters.
llvm-svn: 155428
2012-04-24 06:02:29 +00:00
Nadav Rotem 3f8acfc3c4 Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics).
llvm-svn: 155397
2012-04-23 21:53:37 +00:00
Preston Gurd 9a0914753a This patch fixes a problem which arose when using the Post-RA scheduler
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.

This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.

This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.

The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().  

It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.

It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.

Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.

Patch by Andy Zhang!

Thanks to Jakob and Anton for their reviews.

llvm-svn: 155395
2012-04-23 21:39:35 +00:00
Craig Topper 153bb34a3c Use MVT instead of EVT through all of LowerVECTOR_SHUFFLEtoBlend and not just the switch. Saves a little bit of binary size.
llvm-svn: 155339
2012-04-23 07:36:33 +00:00
Craig Topper 0a2c809d09 Make getZeroVector and getOnesVector more alike as far as how they detect 128-bit versus 256-bit vectors. Be explicit about both sizes and use llvm_unreachable. Similar changes to getLegalSplat.
llvm-svn: 155337
2012-04-23 07:24:41 +00:00
Craig Topper 2bbe8bcf4e Tidy up by removing some 'else' after 'return'
llvm-svn: 155336
2012-04-23 06:57:04 +00:00
Craig Topper 5c51eeecfc Tidy up spacing in LowerVECTOR_SHUFFLEtoBlend. Remove code that checks if shuffle operand has a different type than the the shuffle result since it can never happen.
llvm-svn: 155333
2012-04-23 06:38:28 +00:00
Craig Topper a52f0d09b6 Add a couple llvm_unreachables.
llvm-svn: 155332
2012-04-23 03:42:40 +00:00
Craig Topper 984dc015ae Remove some tab characers.
llvm-svn: 155331
2012-04-23 03:28:34 +00:00
Craig Topper ea428fd79c Remove some 'else' after 'return'. No functional change.
llvm-svn: 155330
2012-04-23 03:26:18 +00:00
Craig Topper bf7d5666f0 Make Extract128BitVector and Insert128BitVector take an unsigned instead of an ConstantNode SDValue. getConstant was almost always called just before only to have the functions take it apart and build a new ConstantSDNode.
llvm-svn: 155325
2012-04-22 20:55:18 +00:00
Craig Topper 2d474d6d92 Convert getNode(UNDEF) to getUNDEF.
llvm-svn: 155321
2012-04-22 19:29:34 +00:00
Craig Topper 860ed0d20a Make calls to getVectorShuffle more consistent. Use shuffle VT for calls to getUNDEF instead of requerying. Use &Mask[0] instead of Mask.data().
llvm-svn: 155320
2012-04-22 19:17:57 +00:00
Craig Topper 43397c0900 Tidy up. 80 columns and argument alignment.
llvm-svn: 155319
2012-04-22 18:51:37 +00:00
Craig Topper ad56a744f1 Simplify code by converting multiple places that were manually concatenating 128-bit vectors to use either CONCAT_VECTORS or a helper function. CONCAT_VECTORS will itself be lowered to the same pattern as before. The helper function is needed for concats of BUILD_VECTORs since getNode(CONCAT_VECTORS) will just return a large BUILD_VECTOR and we may be trying to lower large BUILD_VECTORS when this occurs.
llvm-svn: 155318
2012-04-22 18:15:59 +00:00
Elena Demikhovsky 8d7e56c409 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Craig Topper 6eadae8e60 Make some fixed arrays const. Use array_lengthof in a couple places instead of a hardcoded number.
llvm-svn: 155294
2012-04-21 18:58:38 +00:00
Craig Topper 2568bf3089 Tidy up. 80 columns and some other spacing issues.
llvm-svn: 155291
2012-04-21 18:13:35 +00:00
Craig Topper abadc660e0 Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent.
llvm-svn: 155186
2012-04-20 06:31:50 +00:00
Kevin Enderby ec4bd31206 Fixed the llvm-mv X86 disassembler so the 'C' API gets jumps properly
symbolicated.  These have and operand type of TYPE_RELv which was not handled
as isBranch in translateImmediate() in X86Disassembler.cpp.  rdar://11268426 

llvm-svn: 155074
2012-04-18 23:12:11 +00:00
Craig Topper d3c9e404ba Remove AVX vpermil intrinsics. I removed their uses from clang headers and builtins a while back.
llvm-svn: 154985
2012-04-18 05:24:00 +00:00
Craig Topper 354103d8ca Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the immediate is set.
llvm-svn: 154907
2012-04-17 05:54:54 +00:00
Preston Gurd 5333e2e5ce Temporarily turn off anti-dependency checking
during Post RA scheduling in X86,
until the X86 target is changed to properly set up
post RA liveness.

llvm-svn: 154874
2012-04-16 22:52:28 +00:00
Richard Smith 12da79b859 Fix incorrect atomics codegen introduced in r154705, and extend test to catch it.
llvm-svn: 154845
2012-04-16 18:43:53 +00:00
Craig Topper 4badeb3f0d Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes.
llvm-svn: 154801
2012-04-16 07:13:00 +00:00
Craig Topper 26d7a94981 Change type profile for vpermv back to using operand type for the mask argument to match intrinsic behavior. Add a bitcast to the lowering code to convert mask from v8i32 to v8f32 for vpermps.
llvm-svn: 154798
2012-04-16 06:43:40 +00:00
Craig Topper c0075aa7ff Flip the arguments when converting vpermd/vpermps intrinsics into instructions. The intrinsic has the mask as the last operand, but the instruction has it as the second.
llvm-svn: 154797
2012-04-16 06:26:15 +00:00
Craig Topper b86fa404d3 Merge vpermps/vpermd and vpermpd/vpermq SD nodes.
llvm-svn: 154782
2012-04-16 00:41:45 +00:00
Craig Topper b04fe34030 Fix SDTypeProfile for vpermps. The mask operand should be v8i32.
llvm-svn: 154781
2012-04-16 00:12:20 +00:00
Craig Topper 1f8c9eb925 Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits.
llvm-svn: 154780
2012-04-15 23:48:57 +00:00
Craig Topper bfc9a5f7d3 Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors.
llvm-svn: 154778
2012-04-15 22:43:31 +00:00
Nadav Rotem 42bcd04ee3 Fix PR12529. The Vxx family of instructions are only supported by AVX.
Use non-vex instructions for SSE4.

llvm-svn: 154770
2012-04-15 19:36:44 +00:00
Elena Demikhovsky 779a72b49e Added VPERM optimization for AVX2 shuffles
llvm-svn: 154761
2012-04-15 11:18:59 +00:00
Richard Smith 3e8f1f6aea Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y.
llvm-svn: 154705
2012-04-13 22:47:00 +00:00
Evan Cheng 3e869f002c Generalize r153635 to deal with TokenFactor chains; also clean up the logic and fix the tests. rdar://11069732, rdar://11236106
llvm-svn: 154604
2012-04-12 19:14:21 +00:00
Craig Topper d0271b27cb Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions.
llvm-svn: 154580
2012-04-12 07:23:00 +00:00
Nadav Rotem 372cf15125 remove unused argument
llvm-svn: 154494
2012-04-11 11:05:21 +00:00
Nadav Rotem 9bc178ac5c Reapply 154396 after fixing a test.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483
2012-04-11 06:40:27 +00:00
Charles Davis 74c282b5ef Add retw and lretw instructions. Also, fix Intel syntax parsing for all
ret instructions.

llvm-svn: 154468
2012-04-11 01:10:53 +00:00
Chad Rosier f7345b027a Whitespace.
llvm-svn: 154427
2012-04-10 19:42:07 +00:00
Chad Rosier 235a7a1746 Revert r154396, which looks to be the real culprit behind the bot failures.
llvm-svn: 154426
2012-04-10 19:39:18 +00:00
Eric Christopher 65ada95b84 Temporarily revert this patch to see if it brings the buildbots back.
llvm-svn: 154425
2012-04-10 19:33:16 +00:00
David Blaikie 2735136655 Remove unused variable.
llvm-svn: 154398
2012-04-10 15:23:13 +00:00
Nadav Rotem f934f91709 Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396
2012-04-10 14:33:13 +00:00
Evan Cheng f8bad08001 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Preston Gurd 2eec367227 This patch adds X86 instruction itineraries, which were missed by the
original patch to add itineraries, to X86InstrArithmetc.td.  

llvm-svn: 154320
2012-04-09 15:32:22 +00:00
Nadav Rotem fb7e2ae53c Lower some x86 shuffle sequences to the vblend family of instructions.
llvm-svn: 154313
2012-04-09 08:33:21 +00:00
Nadav Rotem b801ca3976 Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type.
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
2012-04-09 07:45:58 +00:00
Chandler Carruth 3779ac10b4 Cleanup and relax a restriction on the matching of global offsets into
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
   the match to try to fit RIP into the base register. If it fails, it
   now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
   as it did before. The reason these immediates are safe is because the
   ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
   This is the only case changed by the patch, and the primary place you
   see it is in TLS, either the win64 section offset TLS or Linux
   local-exec TLS model in a PIC compilation. Here the ABI again ensures
   that the immediates fit because we are in small mode, and any other
   operations required due to the PIC relocation model have been handled
   externally to the Wrapper node (extra loads etc are made around the
   wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304
2012-04-09 02:13:06 +00:00
Chandler Carruth 16f0ebcbb5 Move the TLSModel information into the TargetMachine rather than hiding
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292
2012-04-08 17:20:55 +00:00
Nadav Rotem 82609df647 AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284
2012-04-08 12:54:54 +00:00
Craig Topper d024cef233 Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.
llvm-svn: 154272
2012-04-07 22:32:29 +00:00
Craig Topper aa9aab5ad2 Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
llvm-svn: 154268
2012-04-07 21:57:43 +00:00
NAKAMURA Takumi b95f64134e Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming.
Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.

llvm-svn: 154247
2012-04-07 02:24:20 +00:00
Benjamin Kramer 3cacabfb04 Fix narrowing conversion.
llvm-svn: 154171
2012-04-06 13:33:52 +00:00
Craig Topper 447417c932 Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.
llvm-svn: 154166
2012-04-06 07:45:23 +00:00
Rafael Espindola ba0a6cabb8 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Craig Topper 7629d63bc4 Add support for AVX enhanced comparison predicates. Patch from Kay Tiong Khoo.
llvm-svn: 153935
2012-04-03 05:20:24 +00:00
Benjamin Kramer 1c0541b031 Move getOpcodeName from the various target InstPrinters into the superclass MCInstPrinter.
All implementations used the same code.

llvm-svn: 153866
2012-04-02 08:32:38 +00:00
Craig Topper dab9e35ad0 Remove getInstructionName from MCInstPrinter implementations in favor of using the instruction name table from MCInstrInfo. Reduces static data in the InstPrinter implementations.
llvm-svn: 153863
2012-04-02 07:01:04 +00:00
Craig Topper 54bfde79db Make MCInstrInfo available to the MCInstPrinter. This will be used to remove getInstructionName and the static data it contains since the same tables are already in MCInstrInfo.
llvm-svn: 153860
2012-04-02 06:09:36 +00:00
Nadav Rotem b078350872 This commit contains a few changes that had to go in together.
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Benjamin Kramer 682de39f2d Rip out emission of the regIsInRegClass function for the asm printer.
It's slow, bloated and completely redundant with MCRegisterClass::contains.

llvm-svn: 153782
2012-03-30 23:13:40 +00:00
Benjamin Kramer 88d31b3f0c Add a note about a missed cmov -> sbb opportunity.
llvm-svn: 153741
2012-03-30 13:02:58 +00:00
Lang Hames 5569ce7d56 Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode.
llvm-svn: 153680
2012-03-29 19:54:28 +00:00
Benjamin Kramer 8619c37b5b Replace assert(0) with llvm_unreachable to avoid warnings about dropping off the end of a non-void function in Release builds.
llvm-svn: 153643
2012-03-29 12:37:26 +00:00
Craig Topper a0a603e582 Only allow symbolic names for (v)cmpss/sd/ps/pd encodings 8-31 to be used with 'v' version of instructions.
llvm-svn: 153636
2012-03-29 07:11:23 +00:00
Joel Jones 68d59e8a90 For X86, change load/dec-or-inc/store into dec-or-inc, respectively.
This is a code change to add support for changing instruction sequences of the form:

  load
  inc/dec of 8/16/32/64 bits
  store

into the appropriate X86 inc/dec through memory instruction:

  inc[qlwb] / dec[qlwb]

The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.

llvm-svn: 153635
2012-03-29 05:45:48 +00:00
Joel Jones b474099e63 Reverted to revision 153616 to unblock build
llvm-svn: 153623
2012-03-29 01:20:56 +00:00
Joel Jones b88c81fe0f For X86, change load/dec-or-inc/store into dec-or-inc, respectively.
This is a code change to add support for changing instruction sequences of the form:

  load
  inc/dec of 8/16/32/64 bits
  store

into the appropriate X86 inc/dec through memory instruction:

  inc[qlwb] / dec[qlwb]

The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.

llvm-svn: 153617
2012-03-29 00:37:47 +00:00
Craig Topper 1fcf5bcae1 Prune some includes
llvm-svn: 153502
2012-03-27 07:54:11 +00:00
Craig Topper f6e7e12f75 Remove unnecessary llvm:: qualifications
llvm-svn: 153500
2012-03-27 07:21:54 +00:00
Joerg Sonnenberger a29b5bd2a8 Put Is64BitMemOperand into !defined(NDEBUG) for now.
llvm-svn: 153185
2012-03-21 14:09:26 +00:00
Benjamin Kramer bc1066734f Use a signed value for this enum to avoid spuriuos warnings from gcc.
llvm-svn: 153184
2012-03-21 13:48:11 +00:00
Joerg Sonnenberger 5463e66768 Fix generation of the address size override prefix. Add assertions for
the invalid cases. At least 16bit operand in 64bit mode is currently not
rejected in the parser.

llvm-svn: 153166
2012-03-21 05:48:07 +00:00
Craig Topper 9cfc69c779 Spacing fixes and using 'unsigned' instead of 'int' to index to select shuffle elements for consistency with other shuffle code in X86 backend.
llvm-svn: 153154
2012-03-21 02:14:01 +00:00
Chad Rosier 4106917355 [avx] Add patterns for combining vextractf128 + vmovaps/vmovups/vmobdqu to
vextractf128 with 128-bit mem dest.

Combines

	vextractf128 $0, %ymm0, %xmm0
	vmovaps %xmm0, (%rdi)

to

    vextractf128 $0, %ymm0, (%rdi)

rdar://11082570

llvm-svn: 153139
2012-03-20 21:43:40 +00:00
Chad Rosier 0158ae2e5b [avx] Add the AddedComplexity to the VINSERTI128 avx2 patterns to give
precedence over the VINSERTF128 avx1 patterns.

llvm-svn: 153114
2012-03-20 19:45:07 +00:00
Chad Rosier 93d5427c69 Whitespace.
llvm-svn: 153105
2012-03-20 18:38:33 +00:00
Chad Rosier 5a6011267a [avx] Move the vextractf128 patterns closer to the vextractf128 def. Remove
whitespace from test case.  No functional change intended.

llvm-svn: 153103
2012-03-20 18:24:55 +00:00
Chad Rosier 07a4cb9382 [avx] Adjust the VINSERTF128rm pattern to allow for unaligned loads.
This results in things such as

	vmovups	16(%rdi), %xmm0
	vinsertf128	$1, %xmm0, %ymm0, %ymm0

to be combined to

    vinsertf128	$1, 16(%rdi), %ymm0, %ymm0

rdar://11076953

llvm-svn: 153092
2012-03-20 17:08:51 +00:00
Craig Topper b34d96c614 Remove code that prevented lowering shuffles if they are used by load and themselves used by a extract_vector_elt. This was done to allow the DAG combiner to collapse to a single element load. Unfortunately, sometimes the extract_vector_elt would disappear before DAG combine could do the transformation leaving a vector_shuffle that isel couldn't handle. New code lets the shuffle be converted to a target specific node, but then adds a combine routine that can convert target specific nodes back to vector_shuffles if the folding criteria are met.
llvm-svn: 153080
2012-03-20 07:17:59 +00:00
Craig Topper cbc96a6e90 Factor out target shuffle mask decoding from getShuffleScalarElt and use a SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change.
llvm-svn: 153079
2012-03-20 06:42:26 +00:00
Preston Gurd 48ccc4df0b This patch adds X86 instruction itineraries for non-pseudo opcodes in
X86InstrCompiler.td.
 
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.

llvm-svn: 153033
2012-03-19 14:10:12 +00:00
Benjamin Kramer 57003a6768 Add a note for -ffast-math optimization of vector norm.
llvm-svn: 153031
2012-03-19 00:43:34 +00:00
Craig Topper 129f9ef669 isCommutedMOVLMask should only look at 128-bit vectors to match isMOVLMask.
llvm-svn: 153027
2012-03-18 22:50:10 +00:00
Craig Topper b25fda95f6 Reorder includes in Target backends to following coding standards. Remove some superfluous forward declarations.
llvm-svn: 152997
2012-03-17 18:46:09 +00:00
Chad Rosier b9b73170e3 [avx] Add patterns for VINSERTF128rm.
This results in things such as

	vmovaps	-96(%rbx), %xmm1
	vinsertf128	$1, %xmm1, %ymm0, %ymm0

to be combined to
         
	vinsertf128	$1, -96(%rbx), %ymm0, %ymm0

rdar://10643481

llvm-svn: 152762
2012-03-15 00:45:30 +00:00
Kevin Enderby 1ef22f33d0 Change the X86 assembler to not require a segment register on string
instruction's destination operand like it does for the source operand.
Also fix a typo in the comment for X86AsmParser::isSrcOp().

llvm-svn: 152654
2012-03-13 19:47:55 +00:00
Kevin Enderby fb3110b5d2 Added a missing error check for X86 assembly with mismatched base and index
registers not both being 64-bit or both being 32-bit registers.

llvm-svn: 152580
2012-03-12 21:32:09 +00:00
Craig Topper bef78fc2ee Convert more static tables of registers used by calling convention to uint16_t to reduce space.
llvm-svn: 152538
2012-03-11 07:57:25 +00:00
Kay Tiong Khoo 57c8e7f364 *fix typo in comment; test of commit access
llvm-svn: 152507
2012-03-10 21:29:49 +00:00
Benjamin Kramer adfc73d68f C files in llvm still have to be C89 compliant, remove C++-style comments.
llvm-svn: 152495
2012-03-10 15:10:06 +00:00
Bill Wendling ebb10df441 Fix disasm of iret, sysexit, and sysret when displayed with Intel syntax.
Patch by Kay Tiong Khoo!

llvm-svn: 152487
2012-03-10 07:37:27 +00:00
Kevin Enderby deed5aaa41 Add the missing call to Error when a bad X86 scale expression is parsed.
llvm-svn: 152443
2012-03-09 22:24:10 +00:00
Kevin Enderby 014e1cde5f Fix the x86 disassembler to at least print the lock prefix if it is the first
prefix.  Added a FIXME to remind us this still does not work when it is not the
first prefix.

llvm-svn: 152414
2012-03-09 17:52:49 +00:00
Craig Topper 2dac962864 Use uint16_t to store opcodes in static tables in X86 backend.
llvm-svn: 152391
2012-03-09 07:45:21 +00:00
Chad Rosier a281afc676 Fix a regression from r147481.
Original commit message from r147481:
DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.

Fix:
Unaligned loads need to generate a vmovups.
rdar://10974078

llvm-svn: 152366
2012-03-09 02:00:48 +00:00
Eli Friedman de850676e0 Fix the operand ordering on aliases for shld and shrd. PR12173, part 2.
llvm-svn: 152136
2012-03-06 19:58:46 +00:00
Jim Grosbach fd93a59557 Make MCRegisterInfo available to the the MCInstPrinter.
Used to allow context sensitive printing of super-register or sub-register
references.

llvm-svn: 152043
2012-03-05 19:33:20 +00:00
Chad Rosier 9424aa1c51 Address Evan's comments for r151877.
Specifically, remove the magic number when checking to see if the copy has a 
glue operand and simplify the checking logic.

rdar://10930395

llvm-svn: 152041
2012-03-05 19:27:12 +00:00
Eli Friedman a5a6d6aa8f Make aliases for shld and shrd match gas. PR12173.
llvm-svn: 152014
2012-03-05 04:31:54 +00:00
Craig Topper 1d32658877 Use uint16_t to store register overlaps to reduce static data.
llvm-svn: 152001
2012-03-04 10:43:23 +00:00
Craig Topper 420525ce3b Use uint16_t to store registers in callee saved register tables to reduce size of static data.
llvm-svn: 151996
2012-03-04 03:33:22 +00:00
Craig Topper 6dedbae429 Use uint8_t instead of enums to store values in X86 disassembler table. Shaves 150k off the size of X86DisassemblerDecoder.o
llvm-svn: 151995
2012-03-04 02:16:41 +00:00
Chad Rosier f5e086f18e Prevent obscure and incorrect tail-call optimization.
In this instance we are generating the tail-call during legalizeDAG.  The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call.  The first floor call can't be a tail-call because it's
not in the tail position.  The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.

rdar://10930395

llvm-svn: 151877
2012-03-02 02:50:46 +00:00
Michael J. Spencer 35145f830a Minimal changes for LLVM to compile under VS11.
llvm-svn: 151849
2012-03-01 22:42:52 +00:00
Kevin Enderby b119c08af3 Added annotations for x86 pc relative loads to llvm's 'C' disassembler.
So with darwin's otool(1) an x86_64 hello world .o file will print:
leaq	L_.str(%rip), %rax ## literal pool for: Hello world

llvm-svn: 151769
2012-02-29 22:58:34 +00:00
Andrew Trick 6eb6528b98 Intel Atom instruction itineraries for mov sign extension and mov zero extension.
Patch by Tyler Nowicki!

llvm-svn: 151743
2012-02-29 19:44:41 +00:00
Derek Schuff 56b662ce0f Make MemoryObject accessor members const again
llvm-svn: 151687
2012-02-29 01:09:06 +00:00
Evan Cheng 65f9d19c4f Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Daniel Dunbar ee7b899343 Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng 87c7b09d8d Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
Preston Gurd a49ef92a76 This patch adds instruction latencies for the SSE instructions
to the instruction scheduler for the Intel Atom.

llvm-svn: 151590
2012-02-27 23:35:03 +00:00
Chad Rosier a72393a3f9 Add q suffix aliases for the fistp and fisttp mnemonics.
rdar://10921670
PR11935

llvm-svn: 151543
2012-02-27 19:43:12 +00:00
Craig Topper 6491c8020e X86 disassembler support for jcxz, jecxz, and jrcxz. Fixes PR11643. Patch by Kay Tiong Khoo.
llvm-svn: 151510
2012-02-27 01:54:29 +00:00
NAKAMURA Takumi bdf94879df Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff.
[Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
Clang raised a warning, and X86 LowerOperation would assert out for
fptoui f64 to i32 because it improperly lowered to an illegal
BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
any other changes are necessary. Thanks.

llvm-svn: 151432
2012-02-25 03:37:25 +00:00
Michael J. Spencer 248d65e78b Add WIN_FTOL_* psudo-instructions to model the unique calling convention
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!

llvm-svn: 151382
2012-02-24 19:01:22 +00:00
Pete Cooper 682c76b7d4 Turn avx insert intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove duplicate patterns for selecting the intrinsics
llvm-svn: 151342
2012-02-24 03:51:49 +00:00
Kevin Enderby 6fbcd8d439 Updated the llvm-mc disassembler C API to support for the X86 target.
rdar://10873652

As part of this I updated the llvm-mc disassembler C API to always call the
SymbolLookUp call back even if there is no getOpInfo call back.  If there is a
getOpInfo call back that is tried first and then if that gets no information
then the  SymbolLookUp is called.  I also made the code more robust by
memset(3)'ing to zero the LLVMOpInfo1 struct before then setting
SymbolicOp.Value before for the call to getOpInfo.  And also don't use any
values from the  LLVMOpInfo1 struct if getOpInfo returns 0.  And also don't
use any of the ReferenceType or ReferenceName values from SymbolLookUp if it
returns NULL. rdar://10873563 and rdar://10873683

For the X86 target also fixed bugs so the annotations get printed. 

Also fixed a few places in the ARM target that was not producing symbolic
operands for some instructions.  rdar://10878166

llvm-svn: 151267
2012-02-23 18:18:17 +00:00
Michael J. Spencer 8b98bf2d6b Properly emit _fltused with FastISel. Refactor to share code with SDAG.
Patch by Joe Groff!

llvm-svn: 151183
2012-02-22 19:06:13 +00:00
Chad Rosier 5dfe6dab25 Remove extra semi-colons.
llvm-svn: 151169
2012-02-22 17:25:00 +00:00
Craig Topper cc830f8cda Declare register classes as const. Fix a couple pointers to register classes that weren't already const.
llvm-svn: 151138
2012-02-22 07:28:11 +00:00
Craig Topper 760b134ffa Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Aaron Ballman e67173e718 Adding support for Microsoft's thiscall calling convention. LLVM side of the patch.
llvm-svn: 151123
2012-02-22 03:04:40 +00:00
Ahmed Charles 636a3d618c Remove dead code. Improve llvm_unreachable text. Simplify some control flow.
llvm-svn: 150918
2012-02-19 11:37:01 +00:00
Craig Topper de121a1000 Remove some unneeded includes and fix ordering in X86ISelLowering.cpp. Remove unneeded 'using namespace'.
llvm-svn: 150916
2012-02-19 07:15:48 +00:00
Craig Topper 65a4ceea1e Unify all shuffle mask checking functions take a mask and VT instead of VectorShuffleSDNode.
llvm-svn: 150913
2012-02-19 05:41:45 +00:00
Craig Topper 3e5c04e432 Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel.
llvm-svn: 150908
2012-02-19 02:53:47 +00:00
Jia Liu e1d619691b some comment fix for X86 and ARM
llvm-svn: 150902
2012-02-19 02:03:36 +00:00
Craig Topper 66a3597a4a Add vmfunc instruction to X86 assembler and disassembler.
llvm-svn: 150899
2012-02-19 01:39:49 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Craig Topper 57d3aaed78 Add X86InstrSVM.td that I forgot to add in r150873.
llvm-svn: 150874
2012-02-18 08:34:12 +00:00
Craig Topper ed7aa46366 Add X86 assembler and disassembler support for AMD SVM instructions. Original patch by Kay Tiong Khoo. Few tweaks by me for code density and to reduce replication.
llvm-svn: 150873
2012-02-18 08:19:49 +00:00
Craig Topper ba172d2d59 Remove the last of the old vector_shuffle patterns from X86 isel.
llvm-svn: 150795
2012-02-17 07:02:34 +00:00
Jakob Stoklund Olesen bc6ba479b6 Remove the YMM_HI_6_15 hack.
Call clobbers are now represented with register mask operands.  The
regmask can easily represent the fact that xmm6 is call-preserved while
ymm6 isn't.  This is automatically computed by TableGen from the
CalleeSavedRegs containing xmm6.

llvm-svn: 150709
2012-02-16 17:56:06 +00:00
Jakob Stoklund Olesen 97e3115dc2 Use the same CALL instructions for Windows as for everything else.
The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.

llvm-svn: 150708
2012-02-16 17:56:02 +00:00
Jakob Stoklund Olesen 8a450cb2fa Enable register mask operands for x86 calls.
Call instructions no longer have a list of 43 call-clobbered registers.
Instead, they get a single register mask operand with a bit vector of
call-preserved registers.

This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call
instruction, and it speeds up building call instructions because those
43 imp-def operands no longer need to be added to use-def lists. (And
removed and shifted and re-added for every explicit call operand).

Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and
BranchFolding are significantly faster because they can deal with call
clobbers in bulk.

Overall, clang -O2 is between 0% and 8% faster, uniformly distributed
depending on call density in the compiled code.  Debug builds using
clang -O0 are 0% - 3% faster.

I have verified that this patch doesn't change the assembly generated
for the LLVM nightly test suite when building with -disable-copyprop
and -disable-branch-fold.

Branch folding behaves slightly differently in a few cases because call
instructions have different hash values now.

Copy propagation flushes its data structures when it crosses a register
mask operand. This causes it to leave a few dead copies behind, on the
order of 20 instruction across the entire nightly test suite, including
SPEC. Fixing this properly would require the pass to use different data
structures.

llvm-svn: 150638
2012-02-16 00:02:50 +00:00
Chad Rosier f0687634c3 Use a temporary variable, rather then a series of redundant calls.
llvm-svn: 150538
2012-02-15 00:36:26 +00:00
Pete Cooper c21ebf5c41 Stop custom lowering forr x86 DEC64m from happening if the load in the lowered sequence has more than 1 user
llvm-svn: 150537
2012-02-15 00:33:37 +00:00
Craig Topper cfad98f745 Move old movl vector_shuffle patterns. Not needed anymore since vector_shuffles shouldn't reach isel.
llvm-svn: 150462
2012-02-14 08:14:53 +00:00
Craig Topper 8b19d78808 Still more vector_shuffle pattern removal.
llvm-svn: 150365
2012-02-13 07:23:41 +00:00
Ahmed Charles 32e983e4fc Fix various issues (or do cleanups) found by enabling certain MSVC warnings.
- Use unsigned literals when the desired result is unsigned. This mostly allows unsigned/signed mismatch warnings to be less noisy even if they aren't on by default.
- Remove misplaced llvm_unreachable.
- Add static to a declaration of a function on MSVC x86 only.
- Change some instances of calling a static function through a variable to simply calling that function while removing the unused variable.

llvm-svn: 150364
2012-02-13 06:30:56 +00:00
Craig Topper 74650add0e Remove more vector_shuffle patterns for unpack. These should be target specific nodes when they get to isel.
llvm-svn: 150363
2012-02-13 05:48:49 +00:00
Craig Topper 6d471c9e49 Recommit r150328. Previous test failures should be fixed by r150360.
llvm-svn: 150362
2012-02-13 05:10:10 +00:00
Craig Topper 87119fa37f Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel.
llvm-svn: 150360
2012-02-13 04:30:38 +00:00
NAKAMURA Takumi 0826c17d00 Revert r150328, "Remove more vector_shuffle patterns."
It caused 3 failures on pre-penryn and non-x86(generic) hosts.

llvm-svn: 150357
2012-02-13 00:10:15 +00:00
Pete Cooper 71be57bb32 Fixed bug when custom lowering DEC64m on x86.
If the DEC node had more than one user, it was doing this lowering but
leaving the original DEC node around and so decrementing twice.

Fixes PR11964.

llvm-svn: 150356
2012-02-13 00:10:03 +00:00
Craig Topper e24c94af81 Remove more vector_shuffle patterns.
llvm-svn: 150328
2012-02-12 08:14:35 +00:00
Craig Topper d40d9eb2b3 Remove more vector_shuffle patterns.
llvm-svn: 150321
2012-02-12 01:07:34 +00:00
Craig Topper 330ca97700 Remove more vector_shuffle patterns.
llvm-svn: 150314
2012-02-11 23:31:01 +00:00
Anton Korobeynikov c6b4017ce2 Add support for implicit TLS model used with MS VC runtime.
Patch by Kai Nacke!

llvm-svn: 150307
2012-02-11 17:26:53 +00:00
Benjamin Kramer 915e3d9568 Don't mix declarations and code.
llvm-svn: 150305
2012-02-11 16:01:02 +00:00
Benjamin Kramer 428704eb52 Make the EDis tables const.
llvm-svn: 150304
2012-02-11 14:51:07 +00:00
Benjamin Kramer 478e8de8ef Reuse the enum names from X86Desc in the X86Disassembler.
This requires some gymnastics to make it available for C code. Remove the names
from the disassembler tables, making them relocation free.

llvm-svn: 150303
2012-02-11 14:50:54 +00:00
Craig Topper 981c6cf7b3 Remove some patterns for matching vector_shuffle instructions since vector_shuffles should be custom lowered before isel.
llvm-svn: 150299
2012-02-11 07:43:35 +00:00
Craig Topper 11826a6e10 Fix shuffle lowering code to stop creating temporary DAG nodes to do shuffle mask checks on. This seemed to be confusing things such that vector_shuffle ops to got through to iselection. This is another step towards removing the vector_shuffle handling patterns from isel.
llvm-svn: 150296
2012-02-11 06:24:48 +00:00
Craig Topper a0cd970b81 More tweaks to get the size of the X86 disassembler tables down.
llvm-svn: 150167
2012-02-09 08:58:07 +00:00
Craig Topper 487e744f66 Flatten some of the arrays in the X86 disassembler tables to reduce space needed to store pointers on 64-bit hosts and reduce relocations needed at startup. Part of PR11953.
llvm-svn: 150161
2012-02-09 07:45:30 +00:00
Jakob Stoklund Olesen 4519fd0b21 Handle register masks when searching for EFLAGS clobbers.
Calls clobber the flags, but when using register masks there is no
EFLAGS<imp-def> operand.

llvm-svn: 150117
2012-02-09 00:17:22 +00:00
Elena Demikhovsky 1adc1d53dd Fixed a bug in printing "cmp" pseudo ops.
> This IR code
> %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14)
> fails with assertion:
>
> llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed.
> 0  llc             0x0000000001355803
> 1  llc             0x0000000001355dc9
> 2  libpthread.so.0 0x00007f79a30575d0
> 3  libc.so.6       0x00007f79a23a1945 gsignal + 53
> 4  libc.so.6       0x00007f79a23a2f21 abort + 385
> 5  libc.so.6       0x00007f79a239a810 __assert_fail + 240
> 6  llc             0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const*, unsigned int, llvm::raw_ostream&) + 119

I added the full testing for all possible pseudo-ops of cmp.
I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp.

You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in.

llvm-svn: 150068
2012-02-08 08:37:26 +00:00
Craig Topper 172b9243cd Remove a couple unneeded intrinsic patterns
llvm-svn: 150067
2012-02-08 08:29:30 +00:00
Craig Topper 5405571fe0 Remove GCC builtins for vpermilp* intrinsics as clang no longer needs them. Custom lower the intrinsics to the vpermilp target specific node and remove intrinsic patterns.
llvm-svn: 150060
2012-02-08 06:36:57 +00:00
Evan Cheng 1b81fddd65 Use LEA to adjust stack ptr for Atom. Patch by Andy Zhang.
llvm-svn: 150008
2012-02-07 22:50:41 +00:00
Craig Topper b27fd77c3f Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD.
llvm-svn: 149968
2012-02-07 06:28:42 +00:00
Derek Schuff 8b2dcad4b5 Enable streaming of bitcode
This CL delays reading of function bodies from initial parse until
materialization, allowing overlap of compilation with bitcode download.

llvm-svn: 149918
2012-02-06 22:30:29 +00:00
Chris Lattner 8213c8af29 Remove some dead code and tidy things up now that vectors use ConstantDataVector
instead of always using ConstantVector.

llvm-svn: 149912
2012-02-06 21:56:39 +00:00
Benjamin Kramer 2496717052 X86: Don't call malloc for 4 bits. No functionality change.
llvm-svn: 149866
2012-02-06 12:06:18 +00:00
Craig Topper 1f71057747 Add shuffle decoding support for 256-bit pshufd. Merge vpermilp* and pshufd decoding.
llvm-svn: 149859
2012-02-06 07:17:51 +00:00
Duncan Sands ae22c60f90 Persuade GCC that there is nothing worth warning about here (there isn't).
llvm-svn: 149834
2012-02-05 14:20:11 +00:00
Chandler Carruth ebd90c58e6 Begin fleshing out more convenience predicates in llvm::Triple and
convert at least one client over to use them. Subsequent patches both to
LLVM and Clang will try to convert more people over to a common set of
predicates.

This round of predicates is focused on OS-categorization predicates.

llvm-svn: 149815
2012-02-05 08:26:40 +00:00
Craig Topper c4965bce14 Convert assert(0) to llvm_unreachable
llvm-svn: 149814
2012-02-05 07:21:30 +00:00
Craig Topper 4ed7278ff4 Convert assert(0) to llvm_unreachable in X86 Target directory.
llvm-svn: 149809
2012-02-05 05:38:58 +00:00
Craig Topper 83f3bdaa45 Convert some assert(0) in default of switch statements to llvm_unreachable.
llvm-svn: 149808
2012-02-05 03:43:23 +00:00
Craig Topper 1d471e31ba Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies.
llvm-svn: 149807
2012-02-05 03:14:49 +00:00
Craig Topper 4daa67483d Remove most of the intrinsics for XOP VPCMOV instruction. They all aliased to the same instruction with different types. This would be better accomplished with casts in the not yet created xopintrin.h header file.
llvm-svn: 149795
2012-02-05 00:55:56 +00:00
Andrew Trick f8ea108c05 TargetPassConfig: confine the MC configuration to TargetMachine.
Passes prior to instructon selection are now split into separate configurable stages.
Header dependencies are simplified.
The bulk of this diff is simply removal of the silly DisableVerify flags.

Sorry for the target header churn. Attempting to stabilize them.

llvm-svn: 149754
2012-02-04 02:56:59 +00:00
Craig Topper 47e6d26911 Remove getShuffleVPERMILPImmediate function, getShuffleSHUFImmediate performs the same calculation.
llvm-svn: 149683
2012-02-03 06:52:33 +00:00
Craig Topper d5ffe0900d Remove unnecessary qualification on 256-bit vector handling in LowerBUILD_VECTOR. Condition was already guaranteed by earlier code.
llvm-svn: 149680
2012-02-03 06:32:21 +00:00
Andrew Trick ccb673659a Added TargetPassConfig. The first little step toward configuring codegen passes.
Allows command line overrides to be centralized in LLVMTargetMachine.cpp.
LLVMTargetMachine can intercept common passes and give precedence to command line overrides.
Allows adding "internal" target configuration options without touching TargetOptions.
Encapsulates the PassManager.
Provides a good point to initialize all CodeGen passes so that Pass ID's can be used in APIs.
Allows modifying the target configuration hooks without rebuilding the world.

llvm-svn: 149672
2012-02-03 05:12:41 +00:00
Andrew Trick 808a7a6ce6 whitespace
llvm-svn: 149671
2012-02-03 05:12:30 +00:00
Lang Hames bb682450f9 Incorporate suggestions Chad, Jakob and Evan's suggestions on r149957.
llvm-svn: 149655
2012-02-03 01:13:49 +00:00
Jakob Stoklund Olesen 5e1ac45b93 Require non-NULL register masks.
It doesn't seem worthwhile to give meaning to a NULL register mask
pointer. It complicates all the code using register mask operands.

llvm-svn: 149646
2012-02-02 23:52:57 +00:00
Elena Demikhovsky 6fbb4d2842 Minor change in signature of the getZeroVector()
llvm-svn: 149601
2012-02-02 09:20:18 +00:00
Elena Demikhovsky fb44980b41 Optimization for SIGN_EXTEND operation on AVX.
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.

llvm-svn: 149600
2012-02-02 09:10:43 +00:00
Francois Pichet 26f302d568 Unbreak the MSVC build.
llvm-svn: 149599
2012-02-02 08:36:09 +00:00
Lang Hames 0269caafa6 Set EFLAGS correctly in EmitLoweredSelect on X86.
llvm-svn: 149597
2012-02-02 07:48:37 +00:00
Andrew Trick 8523b16ff5 Instruction scheduling itinerary for Intel Atom.
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.

Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.

Adds a test to verify that the scheduler is working.

Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.

Patch by Preston Gurd!

llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Mon P Wang 9f05206659 Avoid creating an extract element to an illegal type after LegalizeTypes has run.
llvm-svn: 149548
2012-02-01 22:15:20 +00:00
Chad Rosier e273cb08c4 Tidy up.
llvm-svn: 149521
2012-02-01 18:45:51 +00:00
Elena Demikhovsky 824eed70a6 Passing AVX 256-bit structures in Win64 was wrong.
Fixed Win64 calling conventions.

llvm-svn: 149494
2012-02-01 10:46:14 +00:00
Elena Demikhovsky 34cca175ab Shortened code in shuffle masks
llvm-svn: 149493
2012-02-01 10:33:05 +00:00
Elena Demikhovsky 0e48c70ba7 Optimization for "truncate" operation on AVX.
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.

llvm-svn: 149485
2012-02-01 07:56:44 +00:00
Craig Topper 9cdb8bdf04 Don't create VBROADCAST nodes if any nodes use the chain result from the load. Fixes PR11900.
llvm-svn: 149478
2012-02-01 06:51:58 +00:00
Devang Patel a173ee56fd Add assembler dialect attribute in asm parser which lets target specific asm parser change dialect on the fly.
llvm-svn: 149396
2012-01-31 18:14:05 +00:00
Craig Topper b85e40f738 Remove pcmpgt/pcmpeq intrinsics as clang is not using them.
llvm-svn: 149367
2012-01-31 06:52:44 +00:00
Evan Cheng 4e7992eeba PR11834: Use macros which are defined on Windows. Patch by Marina Yatsina.
llvm-svn: 149294
2012-01-30 23:10:32 +00:00
Devang Patel 7cdb2ff6b5 Intel syntax. Adjust special code, used to recognize cmp<comparison code>{ss,sd,ps,pd}, for intel syntax.
llvm-svn: 149291
2012-01-30 22:47:12 +00:00
Devang Patel 9a9bb5c5db Intel syntax. Support .intel_syntax directive.
llvm-svn: 149270
2012-01-30 20:02:42 +00:00
Benjamin Kramer 396c590818 Fix refacto.
llvm-svn: 149269
2012-01-30 20:01:35 +00:00
Douglas Gregor e577cfe172 Eliminate narrowing conversion in initializer list, to make C++11 happy
llvm-svn: 149254
2012-01-30 16:57:18 +00:00
Benjamin Kramer 20af25f47b X86: Simplify shuffle mask generation code.
llvm-svn: 149248
2012-01-30 15:16:21 +00:00
Craig Topper 516cba3380 Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary.
llvm-svn: 149232
2012-01-30 07:50:31 +00:00
Craig Topper ca29bcfc10 Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes.
llvm-svn: 149216
2012-01-30 01:10:15 +00:00
Devang Patel 63fe5697f4 Intel Syntax: Parse mem operand with seg reg. QWORD PTR FS:[320]
llvm-svn: 149142
2012-01-27 19:48:28 +00:00
Craig Topper 5639e9e8fb Move some patterns back near their instructions and use AddedComplexity to fix priority. Merge some patterns into their instruction definition.
llvm-svn: 149122
2012-01-27 07:09:40 +00:00
Jim Grosbach 8f28dbdde5 Keep source location information for X86 MCFixup's.
llvm-svn: 149106
2012-01-27 00:51:27 +00:00
Jakob Stoklund Olesen fc9dce25f7 Handle call-clobbered ymm registers on Win64.
The Win64 calling convention has xmm6-15 as callee-saved while still
clobbering all ymm registers.

Add a YMM_HI_6_15 pseudo-register that aliases the clobbered part of the
ymm registers, and mark that as call-clobbered.  This allows live xmm
registers across calls.

This hack wouldn't be necessary with RegisterMask operands representing
the call clobbers, but they are not quite operational yet.

llvm-svn: 149088
2012-01-26 22:59:28 +00:00
Victor Umansky 5f29b0e57b Fix for the following bug in AVX codegen for double-to-int conversions:
.	"fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode.
.	Currently for AVX mode for <4xdouble> and <8xdouble>  the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode.
.	Consequently, the conversion produces incorrect numbers.
 
The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. 
As .fp_to_sint. DAG node operation is used only for lowering of  "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows.
 
The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). 

llvm-svn: 149056
2012-01-26 08:51:39 +00:00
Craig Topper 86e44bc829 Add HasXOP predicate check covering a bunch of XOP intrinsic patterns.
llvm-svn: 149054
2012-01-26 07:51:55 +00:00
Craig Topper 1c0e22f57a Fix AVX vs SSE patterns ordering issue for VPCMPESTRM and VPCMPISTRM.
llvm-svn: 149053
2012-01-26 07:31:30 +00:00
Craig Topper b91760eff8 Remove some more patterns by custom lowering intrinsics to target specific nodes.
llvm-svn: 149052
2012-01-26 07:18:03 +00:00
Chris Lattner 33633a90a0 fix a bug I introduced in r148929, this is not a splat!
Thanks to Eli for noticing.

llvm-svn: 148947
2012-01-25 09:56:22 +00:00
Craig Topper 7834900950 Custom lower PSIGN and PSHUFB intrinsics to their corresponding target specific nodes so we can remove the isel patterns.
llvm-svn: 148933
2012-01-25 06:43:11 +00:00
Chris Lattner 47a86bdbe2 use ConstantVector::getSplat in a few places.
llvm-svn: 148929
2012-01-25 06:02:56 +00:00
Craig Topper ce4f9c5668 Custom lower phadd and phsub intrinsics to target specific nodes. Remove the patterns that are no longer necessary.
llvm-svn: 148927
2012-01-25 05:37:32 +00:00
Craig Topper 5bcf070e68 Remove AVX 256-bit unaligned load intrinsics. 128-bit versions had been removed a while ago.
llvm-svn: 148922
2012-01-25 04:42:03 +00:00
Craig Topper 3ad5bc019a Merge intrinsic pattern and no pattern versions of VCVTSD2SI intruction definitions. Matches non-AVX version of same instructions.
llvm-svn: 148914
2012-01-25 03:52:09 +00:00
Devang Patel a410ed3ced Intel Syntax: Extend special hand coded logic, to recognize special instructions, for intel syntax.
llvm-svn: 148864
2012-01-24 21:43:36 +00:00
Elena Demikhovsky 0b0c5d8c4c ZERO_EXTEND operation is optimized for AVX.
v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions.

llvm-svn: 148803
2012-01-24 13:54:13 +00:00
Craig Topper 0d8e67aebd Add comments near load pattern fragments indicating that all integer vector loads are promoted to v2i64 or v4i64 so that no one tries to reintroduce pattern fragments for other types.
llvm-svn: 148771
2012-01-24 03:03:17 +00:00
Devang Patel eba7d3dba9 Fix typo.
llvm-svn: 148751
2012-01-23 23:56:33 +00:00
Devang Patel cf893a437e Intel syntax: Robustify parsing of memory operand's displacement experssion.
llvm-svn: 148737
2012-01-23 22:35:25 +00:00
Devang Patel e660fdd953 Intel syntax: Parse memory operand with empty base reg, e.g. DWORD PTR [4*RDI]
llvm-svn: 148721
2012-01-23 20:20:06 +00:00
Devang Patel 880bc1644b Intel syntax: Parse segment registers.
llvm-svn: 148712
2012-01-23 18:31:58 +00:00
Craig Topper edd1d0acfc Custom lower PCMPEQ/PCMPGT intrinsics to target specific nodes and remove the intrinsic patterns.
llvm-svn: 148687
2012-01-23 08:18:28 +00:00
Craig Topper 6b90c5d03e Update more places to use target specific nodes for vector shifts instead of intrinsics.
llvm-svn: 148685
2012-01-23 06:46:22 +00:00
Craig Topper 5e80db4e4f Custom lower vector shift intrinsics to target specific nodes and remove the patterns that are no longer needed.
llvm-svn: 148684
2012-01-23 06:16:53 +00:00
Craig Topper 20c98df340 Remove pattern fragments for v32i8, v16i16, v8i32, v16i8, v8i16, and v4i32 loads. All integer vector loads are promoted to v2i64 or v4i64 so these pattern fragments can never match. Fix or remove patterns that used these fragments.
llvm-svn: 148672
2012-01-23 00:06:44 +00:00
Craig Topper 0b7ad76bd0 Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching.
llvm-svn: 148670
2012-01-22 23:36:02 +00:00
Craig Topper bd4884371b Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching.
llvm-svn: 148667
2012-01-22 22:42:16 +00:00
Craig Topper 094626414d Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead.
llvm-svn: 148664
2012-01-22 19:15:14 +00:00
Craig Topper a4ed5246d8 Make code a little less verbose.
llvm-svn: 148651
2012-01-22 03:07:48 +00:00
Craig Topper cb3433cd58 Remove unused X86 ISD node type defines.
llvm-svn: 148644
2012-01-22 01:15:56 +00:00
Craig Topper 123adfa0f3 Move some vector shift patterns into their instruction definitions.
llvm-svn: 148643
2012-01-22 00:41:20 +00:00
Craig Topper dcaa5fbd08 Add memory patterns for some of the fp<->integer conversion instructions. Fold some patterns into instruction definitions.
llvm-svn: 148641
2012-01-21 18:37:15 +00:00
Benjamin Kramer 5cff13a3fb Remove unused variables.
llvm-svn: 148635
2012-01-21 10:42:44 +00:00
Craig Topper 39bc1e4d25 Fix PR11819 introduced by r148537. I'd commit the test case, but the generated code is terrible as it gets fully scalarized. Expect a future commit to fix that.
llvm-svn: 148632
2012-01-21 08:49:33 +00:00
Devang Patel ce6a2ca8c8 Intel syntax: Robustify register parsing.
llvm-svn: 148591
2012-01-20 22:32:05 +00:00
David Blaikie 46a9f016c5 More dead code removal (using -Wunreachable-code)
llvm-svn: 148578
2012-01-20 21:51:11 +00:00
Devang Patel d0930fff85 Intel syntax: Parse ... PTR [-8]
llvm-svn: 148570
2012-01-20 21:21:01 +00:00
Devang Patel f36613cb45 Intel syntax: For now, disable ambiguous JMP64pcrel32 for intel syntax.
llvm-svn: 148569
2012-01-20 21:14:06 +00:00
Craig Topper a409479023 Improve 256-bit shuffle splitting to allow 2 sources in each 128-bit lane. As long as only a single lane of the source is used in the lane in the destination. This makes the splitting match much closer to what happens with 256-bit shuffles when AVX is disabled and only 128-bit XMM is allowed.
llvm-svn: 148537
2012-01-20 09:29:03 +00:00
Craig Topper 3469212c82 Add support for selecting 256-bit PALIGNR.
llvm-svn: 148532
2012-01-20 05:53:00 +00:00
Eli Friedman 32c7c25dcb Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff.
llvm-svn: 148513
2012-01-20 00:05:46 +00:00
Devang Patel f83dcfd052 Post process 'and', 'sub' instructions and select better encoding, if available.
llvm-svn: 148489
2012-01-19 18:40:55 +00:00
Devang Patel 2529dd9e00 Intel syntax: There is no need to create unary expr for simple negative displacement.
llvm-svn: 148486
2012-01-19 18:15:51 +00:00
Devang Patel 4a62ff9bcb Post process 'xor', 'or' and 'cmp' instructions and select better encoding, if available.
llvm-svn: 148485
2012-01-19 17:53:25 +00:00
Craig Topper a875b7ccc7 Folding table additions and fixes for AVX.
llvm-svn: 148467
2012-01-19 08:50:38 +00:00
Craig Topper 80576e8d1f Merge 128-bit and 256-bit SHUFPS/SHUFPD handling.
llvm-svn: 148466
2012-01-19 08:19:12 +00:00
Nick Lewycky ecc0084f72 Add a TargetOption for disabling tail calls.
llvm-svn: 148442
2012-01-19 00:34:10 +00:00
Jakob Stoklund Olesen ff482f733b Add experimental -x86-use-regmask command line option.
It adds register mask operands to x86 call instructions.  Once all the
backend passes support register mask operands, this will be permanently
enabled.

llvm-svn: 148438
2012-01-18 23:52:22 +00:00
Jakob Stoklund Olesen f1fb1d2375 Ignore register mask operands when lowering instructions to MC.
This is similar to implicit register operands.  MC doesn't understand
register liveness and call clobbers.

llvm-svn: 148437
2012-01-18 23:52:19 +00:00
Devang Patel de47cced25 Process instructions after match to select alternative encoding which may be more desirable.
llvm-svn: 148431
2012-01-18 22:42:29 +00:00
Jim Grosbach aba3de99c0 Tidy up. MCAsmBackend naming conventions.
llvm-svn: 148400
2012-01-18 18:52:16 +00:00
Jakob Stoklund Olesen f43b599550 Add a CoveredBySubRegs property to Register descriptions.
When set, this bit indicates that a register is completely defined by
the value of its sub-registers.

Use the CoveredBySubRegs property to infer which super-registers are
call-preserved given a list of callee-saved registers.  For example, the
ARM registers D8-D15 are callee-saved.  This now automatically implies
that Q4-Q7 are call-preserved.

Conversely, Win64 callees save XMM6-XMM15, but the corresponding
YMM6-YMM15 registers are not call-preserved because they are not fully
defined by their sub-registers.

llvm-svn: 148363
2012-01-18 00:16:39 +00:00
Jakob Stoklund Olesen d51a710bde Move X86 callee saved register lists to the X86CallConv .td file.
Add a trivial implementation of the getCallPreservedMask() hook.

llvm-svn: 148347
2012-01-17 22:47:01 +00:00
Devang Patel c9ed518792 Intel syntax: Fix parser match class to check memory operand size.
llvm-svn: 148338
2012-01-17 21:48:03 +00:00
Devang Patel a7143b6a2b Intel syntax: Parse "BYTE PTR [RDX + RCX]"
llvm-svn: 148334
2012-01-17 21:25:10 +00:00
Devang Patel 2ed6718616 Untabify.
llvm-svn: 148322
2012-01-17 19:09:22 +00:00
Devang Patel 8b39be79ad Intel syntax: Do not unncessarily create plus expression for memory operand displacement.
llvm-svn: 148321
2012-01-17 19:08:07 +00:00
Devang Patel 41b9ddeb7a Intel syntax: Robustify memory operand parsing.
llvm-svn: 148312
2012-01-17 18:00:18 +00:00
Nadav Rotem 86c3807b99 Fix warning.
llvm-svn: 148301
2012-01-17 09:31:09 +00:00
Nadav Rotem 86e5390dbf Fix 11769.
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.

llvm-svn: 148298
2012-01-17 09:13:19 +00:00
Craig Topper 9cafcd8baa Remove unnecessary AVX check from an assert. hasSSE2 is enough.
llvm-svn: 148295
2012-01-17 08:23:44 +00:00
Craig Topper 37b10ef250 Fix a crasher when PerformShiftCombine receives a BUILD_VECTOR of all UNDEF. Probably could use better handling in DAG combine or getNode. Fixes PR11772.
llvm-svn: 148285
2012-01-17 04:44:50 +00:00
Eli Friedman 206ca569aa Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768.
llvm-svn: 148240
2012-01-16 16:42:21 +00:00
Eli Friedman 75e3db4c7a Get rid of unused codegen-only instruction.
llvm-svn: 148239
2012-01-16 16:29:35 +00:00
Craig Topper db8890aedd Give priority to AVX over SSE for 128-bit floating point unpck instructions.
llvm-svn: 148233
2012-01-16 09:56:42 +00:00
Nadav Rotem 57935243bd [AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits.
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.

Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.

llvm-svn: 148225
2012-01-15 19:27:55 +00:00
Benjamin Kramer 339ced4e34 Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen.
llvm-svn: 148218
2012-01-15 13:16:05 +00:00
Craig Topper c10e1abaf3 Fix the memop type on a couple 256-bit AVX instructions that were using f128mem instead of f256mem.
llvm-svn: 148196
2012-01-14 18:29:57 +00:00
Craig Topper d78429f850 Add a bunch of AVX instructions to the folding tables. Also fixed the alignment on 256-bit AVX2 instructions.
llvm-svn: 148194
2012-01-14 18:14:53 +00:00
Chad Rosier 71a185c5c6 Fix pasto from r146196.
llvm-svn: 148167
2012-01-14 01:50:21 +00:00
Devang Patel 7066d28043 Revert r148131, it was committed before it was ready.
llvm-svn: 148134
2012-01-13 19:28:58 +00:00
Devang Patel 7ecdc6d4f5 Refactor.
llvm-svn: 148131
2012-01-13 19:12:18 +00:00
Craig Topper e52d86a740 Convert SHUFPD with the same register for both sources to PSHUFD if it would prevent a register copy. Similar to SHUFPS, but requires the mask to be converted.
llvm-svn: 148112
2012-01-13 09:21:41 +00:00
Craig Topper b1c2ebf6ee use v8i32 as optimal mem type over v8f32 if AVX2 is enabled. Similar to SSE2 vs SSE1.
llvm-svn: 148109
2012-01-13 08:32:21 +00:00
Craig Topper cb7e13d7c0 Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32.
llvm-svn: 148108
2012-01-13 08:12:35 +00:00
Craig Topper 9f14d9f939 Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32.
llvm-svn: 148106
2012-01-13 06:59:47 +00:00
Craig Topper a4c5a47b97 Use 8i32 constant pool entry for converting AVX2_SETALLONES. Possibly fixes PR11750.
llvm-svn: 148101
2012-01-13 06:12:41 +00:00
Craig Topper 2aa07f832e Fix typo in PerformAddCombine that caused any vector type to be checked for horizontal add/sub if AVX2 is enabled. This caused an assert to fail for non 128/256-bit vectors when done before type legalizing. Fixes PR11749.
llvm-svn: 148096
2012-01-13 05:04:25 +00:00
Bill Wendling 9c8456f7ef Fix off-by-one error.
llvm-svn: 148077
2012-01-13 00:41:53 +00:00
Bill Wendling ee5eaebc58 Fix the code that was WRONG.
The registers are placed into the saved registers list in the reverse order,
which is why the original loop was written to loop backwards.

llvm-svn: 148064
2012-01-12 23:05:03 +00:00
Elena Demikhovsky 060f6ccdb8 Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&&  "Op 1 of shuffle should not be undef"' failed.
Added a test.

llvm-svn: 148044
2012-01-12 20:33:10 +00:00
Rafael Espindola 00e861ed57 Support segmented stacks on 64-bit FreeBSD.
This patch uses tcb_spare field in the tcb structure to store info.
Patch by Jyun-Yan You.

llvm-svn: 148041
2012-01-12 20:24:30 +00:00
Rafael Espindola 10745d3381 Support segmented stacks on win32.
Uses the pvArbitrary slot of the TIB, which is reserved for applications. We
only support frames with a static size.

llvm-svn: 148040
2012-01-12 20:22:08 +00:00
Devang Patel 4a6e778aae Rename X86ATTAsmParser -> X86AsmParser
We are using one parser to parse att as well as intel style syntax.

llvm-svn: 148032
2012-01-12 18:03:40 +00:00
Benjamin Kramer 9ece950ddb After Jakob's r147938 exception handling on i386 was completely broken.
Restore the (obviously wrong) behavior from before r147938 without relying on
undefined behavior. Add a fat FIXME note.

This should fix nightly tester failures.

llvm-svn: 148030
2012-01-12 17:37:18 +00:00
Nadav Rotem 0a0a829bea Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors.
The attached testcase was stuck in an endless loop.

llvm-svn: 148027
2012-01-12 15:31:55 +00:00
Benjamin Kramer 5b3aa60b44 X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.
llvm-svn: 148024
2012-01-12 12:41:34 +00:00
Devang Patel fc6be102ae Add predicate method check match memory operand size, if available.
In att style asm syntax memory operand size is derived from suffix attached with mnemonic.  In intel style asm syntax it is part of memory operand hence predicate method check is required to select appropriate instruction.

llvm-svn: 148006
2012-01-12 01:51:42 +00:00
Devang Patel 46831de240 Add intel style operand parser skeleton.
This is a work in progress.

llvm-svn: 148002
2012-01-12 01:36:43 +00:00
Chandler Carruth eb21da060b Switch all of the uses of my InsertDAGNode helper to follow the exact
same pattern. We already had this pattern is a few places, but others
tried to make a rough approximation of an actual DAG structure. As not
everywhere went to this trouble, nothing could rely on this being done.
In fact, I've checked all references to these node Ids, and the ones
that are using the topo-sort properties are actually satisfied with
a strict-weak-ordering. The requirement appears to be that Use >= Def.

I've added a big blurb of comments to this bit of the transform to
clarify why the order is so important for the next reader of the code.

I'm starting with this change as it is very small, and trivially
reverted if something breaks or the >= above really does need to be >.
If that proves the case, we can hide the problem by reverting this
patch, but the problem exists elsewhere as well, and so a more
comprehensive solution will be needed.

llvm-svn: 148001
2012-01-12 01:34:44 +00:00
Rafael Espindola d90466bcbf Support segmented stacks on mac.
This uses TLS slot 90, which actually belongs to JavaScriptCore. We only support
frames with static size
Patch by Brian Anderson.

llvm-svn: 147960
2012-01-11 19:00:37 +00:00
Rafael Espindola 4eecacb9c8 Generate the segmented stack prologue for fastcc too.
Patch by Brian Anderson.

llvm-svn: 147958
2012-01-11 18:41:19 +00:00
Chandler Carruth 3212a34269 Revert r147945 which disabled an addressing mode transformation. I had
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.

If anyone else is seeing failures, please let me know!

llvm-svn: 147957
2012-01-11 18:36:12 +00:00
Rafael Espindola 2b89448d60 Use unsigned comparison in segmented stack prologue.
This is a comparison of two addresses, and GCC does the comparison unsigned.

Patch by Brian Anderson.

llvm-svn: 147954
2012-01-11 18:23:35 +00:00
Rafael Espindola 6635ae1c17 Explicitly set the scale to 1 on some segstack prologue instrs.
Patch by Brian Anderson.

llvm-svn: 147952
2012-01-11 18:14:03 +00:00
Jan Sjödin 21f83d9f36 Add XOP Intrinsics and tests
llvm-svn: 147949
2012-01-11 15:20:20 +00:00
Nadav Rotem baae7e4577 Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
llvm-svn: 147948
2012-01-11 14:07:51 +00:00
Chandler Carruth 9bc48e5215 Disable the transformation I added in r147936 to see if it fixes some
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.

llvm-svn: 147945
2012-01-11 12:17:47 +00:00
Chandler Carruth 3eacfb83fa Hoist a really redundant code pattern into a helper function, and delete
lots of lines of code. No functionality changed.

llvm-svn: 147942
2012-01-11 11:04:36 +00:00
Chandler Carruth b0049f4a43 Simplify the AND-rooted mask+shift checking code to match that of the
SRL-rooted code.

llvm-svn: 147941
2012-01-11 09:35:04 +00:00
Chandler Carruth 3dbcda8478 Unify the interface of the three mask+shift transform helpers, and
factor the differences that were hiding in one of them into its other
caller, the SRL handling code. No change in behavior.

llvm-svn: 147940
2012-01-11 09:35:02 +00:00
Chandler Carruth aa01e6661a Clarify and make explicit some of the requirements for transforming
mask+shift pairs at the beginning of the ISD::AND case block, and then
hoist the final pattern into a helper function, simplifying and
reflowing it appropriately. This should have no observable behavior
change, but several simplifications fell out of this such as directly
computing the new mask constant, etc.

llvm-svn: 147939
2012-01-11 09:35:00 +00:00
Jakob Stoklund Olesen 6039983755 Fix undefined code and reenable test case.
I don't think the compact encoding code is right, but at least is has
defined behavior now.

llvm-svn: 147938
2012-01-11 09:08:04 +00:00
Chandler Carruth 51d3076bbf Hoist the logic to transform shift+mask combinations into sub-register
extracts and scaled addressing modes into its own helper function. No
functionality changed here, just hoisting and layout fixes falling out
of that hoisting.

llvm-svn: 147937
2012-01-11 08:48:20 +00:00
Chandler Carruth 55b2cdee26 Teach the X86 instruction selection to do some heroic transforms to
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

llvm-svn: 147936
2012-01-11 08:41:08 +00:00
Lang Hames 995c63329a Fixed order of operands in comment to match code.
llvm-svn: 147890
2012-01-10 22:53:20 +00:00
Joerg Sonnenberger 96cd35cf6d Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes.
Add a test that checks the stack alignment of a simple function for
Darwin, Linux and NetBSD for 32bit and 64bit mode.

llvm-svn: 147888
2012-01-10 22:43:53 +00:00
Chad Rosier 1a8f0ccd8c Add missing VEX predicates to VMOVSDto64rr/VMOVSDto64mr. This fixes a few
failing test cases on our internal AVX nightly tester.
rdar://10663637

llvm-svn: 147881
2012-01-10 22:14:06 +00:00
Bill Wendling d5ab02600e For i386, don't use the generic code.
As the comment around 7746 says, it's better to use the x87 extended precision
here than SSE. And the generic code doesn't know how to do that. It also regains
the speed lost for the uint64_to_float.c testcase.
<rdar://problem/10669858>

llvm-svn: 147869
2012-01-10 19:41:30 +00:00
Devang Patel 67bf992a8f Add definition for intel asm variant.
Right now, this just adds additional entries in match table. The parser does not use them yet.

llvm-svn: 147859
2012-01-10 17:51:54 +00:00
David Blaikie edbb58c577 Remove unnecessary default cases in switches that cover all enum values.
llvm-svn: 147855
2012-01-10 16:47:17 +00:00
Benjamin Kramer 077ae1d760 Add definitions for AMD's bobcat (aka btver1)
llvm-svn: 147846
2012-01-10 11:50:02 +00:00
Craig Topper 430f3f1bd6 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Craig Topper b0c0f72ae6 Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE.
llvm-svn: 147843
2012-01-10 06:54:16 +00:00
Craig Topper d97bbd7b60 Remove hasSSE*orAVX functions and change all callers to use just hasSSE*. AVX is now an SSE level and no longer disables SSE checks.
llvm-svn: 147842
2012-01-10 06:37:29 +00:00
Craig Topper eb8f9e9e5b Instruction selection priority fixes to remove the XMM/XMMInt/orAVX predicates. Another commit will remove orAVX functions from X86SubTarget.
llvm-svn: 147841
2012-01-10 06:30:56 +00:00
Devang Patel 29ba4f97e6 Fix asm string wrt variants.
llvm-svn: 147805
2012-01-09 21:32:02 +00:00
Devang Patel 85d684a4d9 Split AsmParser into two components - AsmParser and AsmParserVariant
AsmParser holds info specific to target parser.
AsmParserVariant holds info specific to asm variants supported by the target.

llvm-svn: 147787
2012-01-09 19:13:28 +00:00
Chandler Carruth c16622daff Don't rely on the fact that shift values are never very large, and thus
this substraction will result in small negative numbers at worst which
become very large positive numbers on assignment and are thus caught by
the <=4 check on the next line. The >0 check clearly intended to catch
these as negative numbers.

Spotted by inspection, and impossible to trigger given the shift widths
that can be used.

llvm-svn: 147773
2012-01-09 09:47:25 +00:00
Craig Topper f287a4509e Remove AVX hack in X86Subtarget. AVX/AVX2 are now treated as an SSE level. Predicate functions have been altered to maintain previous names and behavior.
llvm-svn: 147770
2012-01-09 09:02:13 +00:00