Commit Graph

2285 Commits

Author SHA1 Message Date
Evan Cheng 75315b877c For something like
uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312
2012-07-16 19:35:43 +00:00
Nadav Rotem eec74c7279 Teach getTargetVShiftNode about TargetConstant nodes.
llvm-svn: 160234
2012-07-15 20:27:43 +00:00
Nadav Rotem 9466e81df6 AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222
2012-07-14 22:26:05 +00:00
Benjamin Kramer 4d0916788d Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it.
I already had the necessary things in place for IR-level passes but missed the machine passes.

llvm-svn: 160137
2012-07-12 18:14:57 +00:00
Benjamin Kramer 0ab2794eda Add intrinsics for Ivy Bridge's rdrand instruction.
The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117
2012-07-12 09:31:43 +00:00
Nadav Rotem d2bdcebb14 When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers.
llvm-svn: 160044
2012-07-11 13:27:05 +00:00
Nadav Rotem d908ddc186 Improve the loading of load-anyext vectors by allowing the codegen to load
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.

llvm-svn: 159991
2012-07-10 13:25:08 +00:00
Jakob Stoklund Olesen d14101e0b9 Make X86 call and return instructions non-variadic.
Function argument and return value registers aren't part of the
encoding, so they should be implicit operands.

llvm-svn: 159728
2012-07-04 23:53:27 +00:00
Jakob Stoklund Olesen 2dee812445 Ensure CopyToReg nodes are always glued to the call instruction.
The CopyToReg nodes that set up the argument registers before a call
must be glued to the call instruction. Otherwise, the scheduler may emit
the physreg copies long before the call, causing long live ranges for
the fixed registers.

Besides disabling good register allocation, that can also expose
problems when EmitInstrWithCustomInserter() splits a basic block during
the live range of a physreg.

llvm-svn: 159721
2012-07-04 19:28:31 +00:00
Elena Demikhovsky 9af899fa88 Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2.
llvm-svn: 159504
2012-07-01 06:12:26 +00:00
Rafael Espindola efdfb1e6b2 In the initial exec mode we always do a load to find the address of a variable.
Before this patch in pic 32 bit code we would add the global base register
and not load from that address. This is a really old bug, but before the
introduction of the tls attributes we would never select initial exec for
pic code.

llvm-svn: 159409
2012-06-29 04:22:35 +00:00
Elena Demikhovsky 863d2d3235 Removed unused variable
llvm-svn: 159197
2012-06-26 10:50:07 +00:00
Bill Wendling 8ed44466c2 Rename to match other X86_64* names.
llvm-svn: 159196
2012-06-26 10:05:06 +00:00
Elena Demikhovsky 26088d2e24 Shuffle optimization for AVX/AVX2.
The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction.
Before:
      vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3]
       vextractf128    $1, %ymm1, %xmm1
       vextractf128    $1, %ymm0, %xmm0
       vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3]
       vinsertf128     $1, %xmm0, %ymm2, %ymm0
After:
      vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4]
      vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4]
      vunpcklps       %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5]

llvm-svn: 159188
2012-06-26 08:04:10 +00:00
Eli Friedman bbcd09cc00 Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196.
llvm-svn: 159176
2012-06-25 23:42:33 +00:00
Jakob Stoklund Olesen 2e22e6a361 %RCX is not a function live-out in eh.return functions.
The function live-out registers must be live at all function returns,
and %RCX is only used by eh.return. When a function also has a normal
return, only %RAX holds a return value.

This fixes PR13188.

llvm-svn: 159116
2012-06-24 15:53:01 +00:00
Pete Cooper 3c680dec8a Remove code i'd been testing with but didn't mean to commit. Oops
llvm-svn: 159094
2012-06-24 00:08:36 +00:00
Pete Cooper fe212e762f DAG legalisation can now handle illegal fma vector types by scalarisation
llvm-svn: 159092
2012-06-24 00:05:44 +00:00
Rafael Espindola a3088f09b3 Handle aliases to tls variables in all architectures, not just x86.
llvm-svn: 159058
2012-06-23 00:30:03 +00:00
Craig Topper b9e8e18949 Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases.
llvm-svn: 158792
2012-06-20 05:39:26 +00:00
Rafael Espindola ca3e0ee8b3 Move the support for using .init_array from ARM to the generic
TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.

Add a command line option to llc that enables it.

llvm-svn: 158692
2012-06-19 00:48:28 +00:00
Craig Topper a54893c662 Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type.
llvm-svn: 158279
2012-06-09 17:02:24 +00:00
Craig Topper 3352ba55b9 Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument.
llvm-svn: 158278
2012-06-09 16:46:13 +00:00
Manman Ren 6bc2d27073 Enable optimization for integer ABS on X86 if Subtarget has CMOV.
llvm-svn: 158220
2012-06-08 18:58:26 +00:00
Manman Ren 2cdc8afccf X86: optimize generated code for integer ABS
This patch will generate the following for integer ABS:
      movl    %edi, %eax
      negl    %eax
      cmovll  %edi, %eax
INSTEAD OF
      movl    %edi, %ecx
      sarl    $31, %ecx
      leal    (%rdi,%rcx), %eax
      xorl    %ecx, %eax

There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. 
This is implemented in PerformXorCombine.

rdar://10695237

llvm-svn: 158175
2012-06-07 22:39:10 +00:00
Nadav Rotem bbd40f67d8 Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates.
This may happen on MIC devices.

llvm-svn: 158168
2012-06-07 20:53:48 +00:00
Manman Ren 746e4859d0 PR13046: we can't replace usage of SUB with CMP in the lowering phase.
It will cause assertion failure later on.

llvm-svn: 158160
2012-06-07 19:27:33 +00:00
Manman Ren ae02c5a93e X86: replace SUB with CMP if possible
This patch will optimize the following
    movq    %rdi, %rax
    subq    %rsi, %rax
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax
to
    cmpq    %rsi, %rdi
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 158126
2012-06-07 00:42:47 +00:00
Benjamin Kramer bde9176663 Fix typos found by http://github.com/lyda/misspell-check
llvm-svn: 157885
2012-06-02 10:20:22 +00:00
Hans Wennborg 789acfb63d Implement the local-dynamic TLS model for x86 (PR3985)
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818
2012-06-01 16:27:21 +00:00
Jakob Stoklund Olesen 4f203ea34b Add support for return value promotion in X86 calling conventions.
Patch by Yiannis Tsiouris!

llvm-svn: 157757
2012-05-31 17:28:20 +00:00
Justin Holewinski aa58397b3c Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall
to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479
2012-05-25 16:35:28 +00:00
Craig Topper 53b4b73be9 Fix constant used for pshufb mask when lowering v16i8 shuffles. Bug introduced in r157043. Fixes PR12908.
llvm-svn: 157236
2012-05-22 06:09:38 +00:00
Craig Topper e88f2fd4f7 Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces.
llvm-svn: 157175
2012-05-21 06:40:16 +00:00
Nadav Rotem c93e91da27 On Haswell, perfer storing YMM registers using a single instruction.
llvm-svn: 157129
2012-05-19 20:30:08 +00:00
Nadav Rotem 900c7cb7ce Add support for additional in-reg vbroadcast patterns
llvm-svn: 157127
2012-05-19 19:57:37 +00:00
Craig Topper 0cf4038c59 Simplify code a bit. No functional change intended.
llvm-svn: 157044
2012-05-18 07:07:36 +00:00
Craig Topper 92db928ee9 Simplify handling of v16i8 shuffles and fix a missed optimization.
llvm-svn: 157043
2012-05-18 06:42:06 +00:00
Hans Wennborg f9d0e44b82 Implement initial-exec TLS model for 32-bit PIC x86
This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong
code here (see the update to test/CodeGen/X86/tls-pie.ll).

llvm-svn: 156611
2012-05-11 10:11:01 +00:00
Nadav Rotem 1a65397017 Fix merge-typo and cleanup
llvm-svn: 156541
2012-05-10 12:50:02 +00:00
Nadav Rotem 15946e50c1 AVX2: Add an additional broadcast idiom.
llvm-svn: 156540
2012-05-10 12:39:13 +00:00
Nadav Rotem b86a3fb8d0 Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program.
Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users.

Fix PR11900.

llvm-svn: 156539
2012-05-10 12:22:05 +00:00
Chad Rosier d8287fec17 Fix a regression from r147481. This combine should only happen if there is a
single use.
rdar://11360370

llvm-svn: 156316
2012-05-07 18:47:44 +00:00
Manman Ren ef4e0479ec X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709
llvm-svn: 156312
2012-05-07 18:06:23 +00:00
Craig Topper 00a1e6d48b Use MVT instead of EVT as the argument to all the shuffle decode functions. Simplify some of the decode functions.
llvm-svn: 156268
2012-05-06 19:46:21 +00:00
Craig Topper 804be3b546 Add VPERMQ/VPERMPD to the list of target specific shuffles that can be looked through for DAG combine purposes.
llvm-svn: 156266
2012-05-06 18:54:26 +00:00
Benjamin Kramer e31f31e5c0 Add a new target hook "predictableSelectIsExpensive".
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.

Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.

I'm not entirely happy with the name of this flag, suggestions welcome ;)

llvm-svn: 156233
2012-05-05 12:49:14 +00:00
Craig Topper bdd2e34b1f Fix some loops to match coding standards. No functional change intended.
llvm-svn: 156159
2012-05-04 06:39:13 +00:00
Craig Topper d4d3237bb8 Fix up some spacing. No functional change.
llvm-svn: 156158
2012-05-04 06:18:33 +00:00
Craig Topper e2ae413746 Simplify broadcast lowering code. No functional change intended.
llvm-svn: 156157
2012-05-04 05:49:51 +00:00
Craig Topper 42f2182366 Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.
llvm-svn: 156156
2012-05-04 04:44:49 +00:00
Craig Topper 59063c0a3d Simplify shuffle narrowing code a bit. No functional change intended.
llvm-svn: 156154
2012-05-04 04:08:44 +00:00
Craig Topper 242183834a Use 'unsigned' instead of 'int' in a few places dealing with counts of vector elements.
llvm-svn: 156060
2012-05-03 07:26:59 +00:00
Craig Topper 315a5cc789 Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982.
llvm-svn: 156059
2012-05-03 07:12:59 +00:00
Preston Gurd 926afd7401 For Intel Atom, use ILP scheduling always, instead of ILP for 64 bit
and Hybrid for 32 bit, since benchmarks show ILP scheduling is better
most of the time.

llvm-svn: 156028
2012-05-02 22:02:02 +00:00
Manman Ren f02efc8731 Revert r155853
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.

llvm-svn: 155992
2012-05-02 15:24:32 +00:00
Craig Topper c73bc39c22 Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter.
llvm-svn: 155982
2012-05-02 08:03:44 +00:00
Manman Ren 425a55c1ce X86: optimization for max-like struct
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0

FROM
movl    %edi, %ecx
subl    %esi, %ecx
cmpl    %edi, %esi
movl    $0, %eax
cmovll  %ecx, %eax
TO
xorl    %eax, %eax
subl    %esi, %edi
cmovll  %eax, %edi
movl    %edi, %eax

rdar: 10734411
llvm-svn: 155919
2012-05-01 17:16:15 +00:00
Manman Ren 4f4d5c8fc8 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

llvm-svn: 155853
2012-04-30 22:51:25 +00:00
Chad Rosier d427d51c2b Tidy up. No functional change intended.
llvm-svn: 155832
2012-04-30 17:47:15 +00:00
Craig Topper 55b3990837 No need to normalize index before calling Extract128BitVector
llvm-svn: 155811
2012-04-30 05:17:10 +00:00
Jakub Staszak da03f3ba64 Remove unneeded casts. No functionality change.
llvm-svn: 155800
2012-04-29 20:52:53 +00:00
Craig Topper 3b94fa63d6 Simplify code a bit. No functional change intended.
llvm-svn: 155798
2012-04-29 20:22:05 +00:00
Craig Topper 0fa6c7e593 Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements.
llvm-svn: 155742
2012-04-27 22:54:43 +00:00
Chad Rosier 32c2178ef3 Add x86-specific DAG combine to simplify:
x == -y --> x+y == 0
 x != -y --> x+y != 0

On x86, the generated code goes from
   negl    %esi
   cmpl    %esi, %edi
   je    .LBB0_2
to
   addl    %esi, %edi
   je    .L4

This case is correctly handled for ARM with "cmn".

Patch by Manman Ren.
rdar://11245199
PR12545

llvm-svn: 155739
2012-04-27 22:33:25 +00:00
Craig Topper 42cd8d2c00 Tidy up spacing.
llvm-svn: 155733
2012-04-27 21:05:09 +00:00
Benjamin Kramer 913da4b261 X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!

llvm-svn: 155704
2012-04-27 12:07:43 +00:00
Craig Topper 5ff6dc34b9 Use vector_shuffles instead of target specific unpack nodes for AVX ZERO_EXTEND/ANY_EXTEND combine. These will be converted to target specific nodes during lowering. This is more consistent with other code.
llvm-svn: 155537
2012-04-25 06:39:39 +00:00
Nadav Rotem 7b7b99c74a AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8
immediate. We can't use it here because the shuffle code does not check that
the lower part of the word is identical to the upper part.

llvm-svn: 155440
2012-04-24 11:27:53 +00:00
Craig Topper 0b65c40821 Remove dangling spaces. Fix some other formatting.
llvm-svn: 155429
2012-04-24 06:36:35 +00:00
Craig Topper 6f2a535de2 Simplify code a bit and make it compile better. Remove unused parameters.
llvm-svn: 155428
2012-04-24 06:02:29 +00:00
Nadav Rotem 3f8acfc3c4 Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics).
llvm-svn: 155397
2012-04-23 21:53:37 +00:00
Craig Topper 153bb34a3c Use MVT instead of EVT through all of LowerVECTOR_SHUFFLEtoBlend and not just the switch. Saves a little bit of binary size.
llvm-svn: 155339
2012-04-23 07:36:33 +00:00
Craig Topper 0a2c809d09 Make getZeroVector and getOnesVector more alike as far as how they detect 128-bit versus 256-bit vectors. Be explicit about both sizes and use llvm_unreachable. Similar changes to getLegalSplat.
llvm-svn: 155337
2012-04-23 07:24:41 +00:00
Craig Topper 2bbe8bcf4e Tidy up by removing some 'else' after 'return'
llvm-svn: 155336
2012-04-23 06:57:04 +00:00
Craig Topper 5c51eeecfc Tidy up spacing in LowerVECTOR_SHUFFLEtoBlend. Remove code that checks if shuffle operand has a different type than the the shuffle result since it can never happen.
llvm-svn: 155333
2012-04-23 06:38:28 +00:00
Craig Topper a52f0d09b6 Add a couple llvm_unreachables.
llvm-svn: 155332
2012-04-23 03:42:40 +00:00
Craig Topper 984dc015ae Remove some tab characers.
llvm-svn: 155331
2012-04-23 03:28:34 +00:00
Craig Topper ea428fd79c Remove some 'else' after 'return'. No functional change.
llvm-svn: 155330
2012-04-23 03:26:18 +00:00
Craig Topper bf7d5666f0 Make Extract128BitVector and Insert128BitVector take an unsigned instead of an ConstantNode SDValue. getConstant was almost always called just before only to have the functions take it apart and build a new ConstantSDNode.
llvm-svn: 155325
2012-04-22 20:55:18 +00:00
Craig Topper 2d474d6d92 Convert getNode(UNDEF) to getUNDEF.
llvm-svn: 155321
2012-04-22 19:29:34 +00:00
Craig Topper 860ed0d20a Make calls to getVectorShuffle more consistent. Use shuffle VT for calls to getUNDEF instead of requerying. Use &Mask[0] instead of Mask.data().
llvm-svn: 155320
2012-04-22 19:17:57 +00:00
Craig Topper 43397c0900 Tidy up. 80 columns and argument alignment.
llvm-svn: 155319
2012-04-22 18:51:37 +00:00
Craig Topper ad56a744f1 Simplify code by converting multiple places that were manually concatenating 128-bit vectors to use either CONCAT_VECTORS or a helper function. CONCAT_VECTORS will itself be lowered to the same pattern as before. The helper function is needed for concats of BUILD_VECTORs since getNode(CONCAT_VECTORS) will just return a large BUILD_VECTOR and we may be trying to lower large BUILD_VECTORS when this occurs.
llvm-svn: 155318
2012-04-22 18:15:59 +00:00
Elena Demikhovsky 8d7e56c409 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Craig Topper 6eadae8e60 Make some fixed arrays const. Use array_lengthof in a couple places instead of a hardcoded number.
llvm-svn: 155294
2012-04-21 18:58:38 +00:00
Craig Topper 2568bf3089 Tidy up. 80 columns and some other spacing issues.
llvm-svn: 155291
2012-04-21 18:13:35 +00:00
Craig Topper abadc660e0 Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent.
llvm-svn: 155186
2012-04-20 06:31:50 +00:00
Craig Topper d3c9e404ba Remove AVX vpermil intrinsics. I removed their uses from clang headers and builtins a while back.
llvm-svn: 154985
2012-04-18 05:24:00 +00:00
Craig Topper 354103d8ca Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the immediate is set.
llvm-svn: 154907
2012-04-17 05:54:54 +00:00
Richard Smith 12da79b859 Fix incorrect atomics codegen introduced in r154705, and extend test to catch it.
llvm-svn: 154845
2012-04-16 18:43:53 +00:00
Craig Topper 4badeb3f0d Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes.
llvm-svn: 154801
2012-04-16 07:13:00 +00:00
Craig Topper 26d7a94981 Change type profile for vpermv back to using operand type for the mask argument to match intrinsic behavior. Add a bitcast to the lowering code to convert mask from v8i32 to v8f32 for vpermps.
llvm-svn: 154798
2012-04-16 06:43:40 +00:00
Craig Topper b86fa404d3 Merge vpermps/vpermd and vpermpd/vpermq SD nodes.
llvm-svn: 154782
2012-04-16 00:41:45 +00:00
Craig Topper 1f8c9eb925 Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits.
llvm-svn: 154780
2012-04-15 23:48:57 +00:00
Elena Demikhovsky 779a72b49e Added VPERM optimization for AVX2 shuffles
llvm-svn: 154761
2012-04-15 11:18:59 +00:00
Richard Smith 3e8f1f6aea Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y.
llvm-svn: 154705
2012-04-13 22:47:00 +00:00
Nadav Rotem 372cf15125 remove unused argument
llvm-svn: 154494
2012-04-11 11:05:21 +00:00
Nadav Rotem 9bc178ac5c Reapply 154396 after fixing a test.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483
2012-04-11 06:40:27 +00:00
Chad Rosier f7345b027a Whitespace.
llvm-svn: 154427
2012-04-10 19:42:07 +00:00
Chad Rosier 235a7a1746 Revert r154396, which looks to be the real culprit behind the bot failures.
llvm-svn: 154426
2012-04-10 19:39:18 +00:00
Eric Christopher 65ada95b84 Temporarily revert this patch to see if it brings the buildbots back.
llvm-svn: 154425
2012-04-10 19:33:16 +00:00
David Blaikie 2735136655 Remove unused variable.
llvm-svn: 154398
2012-04-10 15:23:13 +00:00
Nadav Rotem f934f91709 Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396
2012-04-10 14:33:13 +00:00
Evan Cheng f8bad08001 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Nadav Rotem fb7e2ae53c Lower some x86 shuffle sequences to the vblend family of instructions.
llvm-svn: 154313
2012-04-09 08:33:21 +00:00
Nadav Rotem b801ca3976 Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type.
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
2012-04-09 07:45:58 +00:00
Chandler Carruth 16f0ebcbb5 Move the TLSModel information into the TargetMachine rather than hiding
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292
2012-04-08 17:20:55 +00:00
Nadav Rotem 82609df647 AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284
2012-04-08 12:54:54 +00:00
Benjamin Kramer 3cacabfb04 Fix narrowing conversion.
llvm-svn: 154171
2012-04-06 13:33:52 +00:00
Craig Topper 447417c932 Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.
llvm-svn: 154166
2012-04-06 07:45:23 +00:00
Rafael Espindola ba0a6cabb8 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Nadav Rotem b078350872 This commit contains a few changes that had to go in together.
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Craig Topper 9cfc69c779 Spacing fixes and using 'unsigned' instead of 'int' to index to select shuffle elements for consistency with other shuffle code in X86 backend.
llvm-svn: 153154
2012-03-21 02:14:01 +00:00
Craig Topper b34d96c614 Remove code that prevented lowering shuffles if they are used by load and themselves used by a extract_vector_elt. This was done to allow the DAG combiner to collapse to a single element load. Unfortunately, sometimes the extract_vector_elt would disappear before DAG combine could do the transformation leaving a vector_shuffle that isel couldn't handle. New code lets the shuffle be converted to a target specific node, but then adds a combine routine that can convert target specific nodes back to vector_shuffles if the folding criteria are met.
llvm-svn: 153080
2012-03-20 07:17:59 +00:00
Craig Topper cbc96a6e90 Factor out target shuffle mask decoding from getShuffleScalarElt and use a SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change.
llvm-svn: 153079
2012-03-20 06:42:26 +00:00
Craig Topper 129f9ef669 isCommutedMOVLMask should only look at 128-bit vectors to match isMOVLMask.
llvm-svn: 153027
2012-03-18 22:50:10 +00:00
Craig Topper bef78fc2ee Convert more static tables of registers used by calling convention to uint16_t to reduce space.
llvm-svn: 152538
2012-03-11 07:57:25 +00:00
Chad Rosier 9424aa1c51 Address Evan's comments for r151877.
Specifically, remove the magic number when checking to see if the copy has a 
glue operand and simplify the checking logic.

rdar://10930395

llvm-svn: 152041
2012-03-05 19:27:12 +00:00
Chad Rosier f5e086f18e Prevent obscure and incorrect tail-call optimization.
In this instance we are generating the tail-call during legalizeDAG.  The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call.  The first floor call can't be a tail-call because it's
not in the tail position.  The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.

rdar://10930395

llvm-svn: 151877
2012-03-02 02:50:46 +00:00
Evan Cheng 65f9d19c4f Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Daniel Dunbar ee7b899343 Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng 87c7b09d8d Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
NAKAMURA Takumi bdf94879df Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff.
[Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
Clang raised a warning, and X86 LowerOperation would assert out for
fptoui f64 to i32 because it improperly lowered to an illegal
BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
any other changes are necessary. Thanks.

llvm-svn: 151432
2012-02-25 03:37:25 +00:00
Michael J. Spencer 248d65e78b Add WIN_FTOL_* psudo-instructions to model the unique calling convention
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!

llvm-svn: 151382
2012-02-24 19:01:22 +00:00
Craig Topper 760b134ffa Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Craig Topper de121a1000 Remove some unneeded includes and fix ordering in X86ISelLowering.cpp. Remove unneeded 'using namespace'.
llvm-svn: 150916
2012-02-19 07:15:48 +00:00
Craig Topper 65a4ceea1e Unify all shuffle mask checking functions take a mask and VT instead of VectorShuffleSDNode.
llvm-svn: 150913
2012-02-19 05:41:45 +00:00
Craig Topper 3e5c04e432 Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel.
llvm-svn: 150908
2012-02-19 02:53:47 +00:00
Jakob Stoklund Olesen 97e3115dc2 Use the same CALL instructions for Windows as for everything else.
The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.

llvm-svn: 150708
2012-02-16 17:56:02 +00:00
Jakob Stoklund Olesen 8a450cb2fa Enable register mask operands for x86 calls.
Call instructions no longer have a list of 43 call-clobbered registers.
Instead, they get a single register mask operand with a bit vector of
call-preserved registers.

This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call
instruction, and it speeds up building call instructions because those
43 imp-def operands no longer need to be added to use-def lists. (And
removed and shifted and re-added for every explicit call operand).

Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and
BranchFolding are significantly faster because they can deal with call
clobbers in bulk.

Overall, clang -O2 is between 0% and 8% faster, uniformly distributed
depending on call density in the compiled code.  Debug builds using
clang -O0 are 0% - 3% faster.

I have verified that this patch doesn't change the assembly generated
for the LLVM nightly test suite when building with -disable-copyprop
and -disable-branch-fold.

Branch folding behaves slightly differently in a few cases because call
instructions have different hash values now.

Copy propagation flushes its data structures when it crosses a register
mask operand. This causes it to leave a few dead copies behind, on the
order of 20 instruction across the entire nightly test suite, including
SPEC. Fixing this properly would require the pass to use different data
structures.

llvm-svn: 150638
2012-02-16 00:02:50 +00:00
Craig Topper 87119fa37f Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel.
llvm-svn: 150360
2012-02-13 04:30:38 +00:00
Anton Korobeynikov c6b4017ce2 Add support for implicit TLS model used with MS VC runtime.
Patch by Kai Nacke!

llvm-svn: 150307
2012-02-11 17:26:53 +00:00
Craig Topper 11826a6e10 Fix shuffle lowering code to stop creating temporary DAG nodes to do shuffle mask checks on. This seemed to be confusing things such that vector_shuffle ops to got through to iselection. This is another step towards removing the vector_shuffle handling patterns from isel.
llvm-svn: 150296
2012-02-11 06:24:48 +00:00
Elena Demikhovsky 1adc1d53dd Fixed a bug in printing "cmp" pseudo ops.
> This IR code
> %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14)
> fails with assertion:
>
> llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed.
> 0  llc             0x0000000001355803
> 1  llc             0x0000000001355dc9
> 2  libpthread.so.0 0x00007f79a30575d0
> 3  libc.so.6       0x00007f79a23a1945 gsignal + 53
> 4  libc.so.6       0x00007f79a23a2f21 abort + 385
> 5  libc.so.6       0x00007f79a239a810 __assert_fail + 240
> 6  llc             0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const*, unsigned int, llvm::raw_ostream&) + 119

I added the full testing for all possible pseudo-ops of cmp.
I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp.

You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in.

llvm-svn: 150068
2012-02-08 08:37:26 +00:00
Craig Topper 5405571fe0 Remove GCC builtins for vpermilp* intrinsics as clang no longer needs them. Custom lower the intrinsics to the vpermilp target specific node and remove intrinsic patterns.
llvm-svn: 150060
2012-02-08 06:36:57 +00:00
Craig Topper b27fd77c3f Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD.
llvm-svn: 149968
2012-02-07 06:28:42 +00:00
Chris Lattner 8213c8af29 Remove some dead code and tidy things up now that vectors use ConstantDataVector
instead of always using ConstantVector.

llvm-svn: 149912
2012-02-06 21:56:39 +00:00
Benjamin Kramer 2496717052 X86: Don't call malloc for 4 bits. No functionality change.
llvm-svn: 149866
2012-02-06 12:06:18 +00:00
Craig Topper 1f71057747 Add shuffle decoding support for 256-bit pshufd. Merge vpermilp* and pshufd decoding.
llvm-svn: 149859
2012-02-06 07:17:51 +00:00
Duncan Sands ae22c60f90 Persuade GCC that there is nothing worth warning about here (there isn't).
llvm-svn: 149834
2012-02-05 14:20:11 +00:00
Craig Topper 4ed7278ff4 Convert assert(0) to llvm_unreachable in X86 Target directory.
llvm-svn: 149809
2012-02-05 05:38:58 +00:00
Craig Topper 83f3bdaa45 Convert some assert(0) in default of switch statements to llvm_unreachable.
llvm-svn: 149808
2012-02-05 03:43:23 +00:00
Craig Topper 1d471e31ba Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies.
llvm-svn: 149807
2012-02-05 03:14:49 +00:00
Craig Topper 47e6d26911 Remove getShuffleVPERMILPImmediate function, getShuffleSHUFImmediate performs the same calculation.
llvm-svn: 149683
2012-02-03 06:52:33 +00:00
Craig Topper d5ffe0900d Remove unnecessary qualification on 256-bit vector handling in LowerBUILD_VECTOR. Condition was already guaranteed by earlier code.
llvm-svn: 149680
2012-02-03 06:32:21 +00:00
Lang Hames bb682450f9 Incorporate suggestions Chad, Jakob and Evan's suggestions on r149957.
llvm-svn: 149655
2012-02-03 01:13:49 +00:00
Jakob Stoklund Olesen 5e1ac45b93 Require non-NULL register masks.
It doesn't seem worthwhile to give meaning to a NULL register mask
pointer. It complicates all the code using register mask operands.

llvm-svn: 149646
2012-02-02 23:52:57 +00:00
Elena Demikhovsky 6fbb4d2842 Minor change in signature of the getZeroVector()
llvm-svn: 149601
2012-02-02 09:20:18 +00:00
Elena Demikhovsky fb44980b41 Optimization for SIGN_EXTEND operation on AVX.
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.

llvm-svn: 149600
2012-02-02 09:10:43 +00:00
Francois Pichet 26f302d568 Unbreak the MSVC build.
llvm-svn: 149599
2012-02-02 08:36:09 +00:00
Lang Hames 0269caafa6 Set EFLAGS correctly in EmitLoweredSelect on X86.
llvm-svn: 149597
2012-02-02 07:48:37 +00:00
Andrew Trick 8523b16ff5 Instruction scheduling itinerary for Intel Atom.
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.

Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.

Adds a test to verify that the scheduler is working.

Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.

Patch by Preston Gurd!

llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Mon P Wang 9f05206659 Avoid creating an extract element to an illegal type after LegalizeTypes has run.
llvm-svn: 149548
2012-02-01 22:15:20 +00:00
Chad Rosier e273cb08c4 Tidy up.
llvm-svn: 149521
2012-02-01 18:45:51 +00:00
Elena Demikhovsky 34cca175ab Shortened code in shuffle masks
llvm-svn: 149493
2012-02-01 10:33:05 +00:00
Elena Demikhovsky 0e48c70ba7 Optimization for "truncate" operation on AVX.
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.

llvm-svn: 149485
2012-02-01 07:56:44 +00:00
Craig Topper 9cdb8bdf04 Don't create VBROADCAST nodes if any nodes use the chain result from the load. Fixes PR11900.
llvm-svn: 149478
2012-02-01 06:51:58 +00:00
Craig Topper b85e40f738 Remove pcmpgt/pcmpeq intrinsics as clang is not using them.
llvm-svn: 149367
2012-01-31 06:52:44 +00:00
Benjamin Kramer 396c590818 Fix refacto.
llvm-svn: 149269
2012-01-30 20:01:35 +00:00
Douglas Gregor e577cfe172 Eliminate narrowing conversion in initializer list, to make C++11 happy
llvm-svn: 149254
2012-01-30 16:57:18 +00:00
Benjamin Kramer 20af25f47b X86: Simplify shuffle mask generation code.
llvm-svn: 149248
2012-01-30 15:16:21 +00:00
Craig Topper 516cba3380 Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary.
llvm-svn: 149232
2012-01-30 07:50:31 +00:00
Craig Topper ca29bcfc10 Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes.
llvm-svn: 149216
2012-01-30 01:10:15 +00:00
Craig Topper b91760eff8 Remove some more patterns by custom lowering intrinsics to target specific nodes.
llvm-svn: 149052
2012-01-26 07:18:03 +00:00
Chris Lattner 33633a90a0 fix a bug I introduced in r148929, this is not a splat!
Thanks to Eli for noticing.

llvm-svn: 148947
2012-01-25 09:56:22 +00:00
Craig Topper 7834900950 Custom lower PSIGN and PSHUFB intrinsics to their corresponding target specific nodes so we can remove the isel patterns.
llvm-svn: 148933
2012-01-25 06:43:11 +00:00
Chris Lattner 47a86bdbe2 use ConstantVector::getSplat in a few places.
llvm-svn: 148929
2012-01-25 06:02:56 +00:00
Craig Topper ce4f9c5668 Custom lower phadd and phsub intrinsics to target specific nodes. Remove the patterns that are no longer necessary.
llvm-svn: 148927
2012-01-25 05:37:32 +00:00
Elena Demikhovsky 0b0c5d8c4c ZERO_EXTEND operation is optimized for AVX.
v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions.

llvm-svn: 148803
2012-01-24 13:54:13 +00:00
Craig Topper edd1d0acfc Custom lower PCMPEQ/PCMPGT intrinsics to target specific nodes and remove the intrinsic patterns.
llvm-svn: 148687
2012-01-23 08:18:28 +00:00
Craig Topper 6b90c5d03e Update more places to use target specific nodes for vector shifts instead of intrinsics.
llvm-svn: 148685
2012-01-23 06:46:22 +00:00
Craig Topper 5e80db4e4f Custom lower vector shift intrinsics to target specific nodes and remove the patterns that are no longer needed.
llvm-svn: 148684
2012-01-23 06:16:53 +00:00
Craig Topper 0b7ad76bd0 Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching.
llvm-svn: 148670
2012-01-22 23:36:02 +00:00
Craig Topper bd4884371b Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching.
llvm-svn: 148667
2012-01-22 22:42:16 +00:00
Craig Topper 094626414d Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead.
llvm-svn: 148664
2012-01-22 19:15:14 +00:00
Craig Topper a4ed5246d8 Make code a little less verbose.
llvm-svn: 148651
2012-01-22 03:07:48 +00:00
Craig Topper cb3433cd58 Remove unused X86 ISD node type defines.
llvm-svn: 148644
2012-01-22 01:15:56 +00:00
Craig Topper 39bc1e4d25 Fix PR11819 introduced by r148537. I'd commit the test case, but the generated code is terrible as it gets fully scalarized. Expect a future commit to fix that.
llvm-svn: 148632
2012-01-21 08:49:33 +00:00
David Blaikie 46a9f016c5 More dead code removal (using -Wunreachable-code)
llvm-svn: 148578
2012-01-20 21:51:11 +00:00
Craig Topper a409479023 Improve 256-bit shuffle splitting to allow 2 sources in each 128-bit lane. As long as only a single lane of the source is used in the lane in the destination. This makes the splitting match much closer to what happens with 256-bit shuffles when AVX is disabled and only 128-bit XMM is allowed.
llvm-svn: 148537
2012-01-20 09:29:03 +00:00
Craig Topper 3469212c82 Add support for selecting 256-bit PALIGNR.
llvm-svn: 148532
2012-01-20 05:53:00 +00:00
Eli Friedman 32c7c25dcb Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff.
llvm-svn: 148513
2012-01-20 00:05:46 +00:00
Craig Topper 80576e8d1f Merge 128-bit and 256-bit SHUFPS/SHUFPD handling.
llvm-svn: 148466
2012-01-19 08:19:12 +00:00
Nick Lewycky ecc0084f72 Add a TargetOption for disabling tail calls.
llvm-svn: 148442
2012-01-19 00:34:10 +00:00
Jakob Stoklund Olesen ff482f733b Add experimental -x86-use-regmask command line option.
It adds register mask operands to x86 call instructions.  Once all the
backend passes support register mask operands, this will be permanently
enabled.

llvm-svn: 148438
2012-01-18 23:52:22 +00:00
Nadav Rotem 86c3807b99 Fix warning.
llvm-svn: 148301
2012-01-17 09:31:09 +00:00
Nadav Rotem 86e5390dbf Fix 11769.
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.

llvm-svn: 148298
2012-01-17 09:13:19 +00:00
Craig Topper 9cafcd8baa Remove unnecessary AVX check from an assert. hasSSE2 is enough.
llvm-svn: 148295
2012-01-17 08:23:44 +00:00
Craig Topper 37b10ef250 Fix a crasher when PerformShiftCombine receives a BUILD_VECTOR of all UNDEF. Probably could use better handling in DAG combine or getNode. Fixes PR11772.
llvm-svn: 148285
2012-01-17 04:44:50 +00:00
Nadav Rotem 57935243bd [AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits.
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.

Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.

llvm-svn: 148225
2012-01-15 19:27:55 +00:00
Benjamin Kramer 339ced4e34 Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen.
llvm-svn: 148218
2012-01-15 13:16:05 +00:00
Craig Topper b1c2ebf6ee use v8i32 as optimal mem type over v8f32 if AVX2 is enabled. Similar to SSE2 vs SSE1.
llvm-svn: 148109
2012-01-13 08:32:21 +00:00
Craig Topper cb7e13d7c0 Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32.
llvm-svn: 148108
2012-01-13 08:12:35 +00:00
Craig Topper 2aa07f832e Fix typo in PerformAddCombine that caused any vector type to be checked for horizontal add/sub if AVX2 is enabled. This caused an assert to fail for non 128/256-bit vectors when done before type legalizing. Fixes PR11749.
llvm-svn: 148096
2012-01-13 05:04:25 +00:00
Elena Demikhovsky 060f6ccdb8 Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&&  "Op 1 of shuffle should not be undef"' failed.
Added a test.

llvm-svn: 148044
2012-01-12 20:33:10 +00:00
Nadav Rotem 0a0a829bea Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors.
The attached testcase was stuck in an endless loop.

llvm-svn: 148027
2012-01-12 15:31:55 +00:00
Rafael Espindola 6635ae1c17 Explicitly set the scale to 1 on some segstack prologue instrs.
Patch by Brian Anderson.

llvm-svn: 147952
2012-01-11 18:14:03 +00:00
Nadav Rotem baae7e4577 Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
llvm-svn: 147948
2012-01-11 14:07:51 +00:00
Lang Hames 995c63329a Fixed order of operands in comment to match code.
llvm-svn: 147890
2012-01-10 22:53:20 +00:00
Bill Wendling d5ab02600e For i386, don't use the generic code.
As the comment around 7746 says, it's better to use the x87 extended precision
here than SSE. And the generic code doesn't know how to do that. It also regains
the speed lost for the uint64_to_float.c testcase.
<rdar://problem/10669858>

llvm-svn: 147869
2012-01-10 19:41:30 +00:00
Craig Topper 430f3f1bd6 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Craig Topper b0c0f72ae6 Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE.
llvm-svn: 147843
2012-01-10 06:54:16 +00:00
Craig Topper d97bbd7b60 Remove hasSSE*orAVX functions and change all callers to use just hasSSE*. AVX is now an SSE level and no longer disables SSE checks.
llvm-svn: 147842
2012-01-10 06:37:29 +00:00
Craig Topper 210e4f81b3 Change some places that were checking for AVX OR SSE1/2 to use hasXMM/hasXMMInt instead. Also fix one place that checked SSE3, but accidentally excluded AVX to use hasSSE3orAVX. This is a step towards removing the AVX hack from the X86Subtarget.h
llvm-svn: 147764
2012-01-09 02:28:15 +00:00
Victor Umansky 540651cf59 Reverted commit #147601 upon Evan's request.
llvm-svn: 147748
2012-01-08 17:20:33 +00:00
Benjamin Kramer 6898db6269 Remove VectorExtras. This unused helper was written for a type of API that is discouraged now.
llvm-svn: 147738
2012-01-07 19:42:13 +00:00
Craig Topper ca66bba45e Remove unnecessary check of hasAVX(). It's already included in hasXMM().
llvm-svn: 147734
2012-01-07 18:48:43 +00:00
Eric Christopher c206d46709 Make the 'x' constraint work for AVX registers as well.
Fixes rdar://10614894

llvm-svn: 147704
2012-01-07 01:02:09 +00:00
Victor Umansky 9255b6d9fe Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)

Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
2012-01-05 08:46:19 +00:00
Bill Wendling ac27f0c830 Replace the uint64_t -> double convertion algorithm with one that's more efficient.
This small bit of ASM code is sufficient to do what the old algorithm did:

     movq       %rax,  %xmm0
     punpckldq  (c0),  %xmm0  // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U }
     subpd      (c1),  %xmm0  // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 }
   #ifdef __SSE3__
     haddpd   %xmm0, %xmm0          
   #else
     pshufd   $0x4e, %xmm0, %xmm1 
     addpd    %xmm1, %xmm0
   #endif

It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on
all processors.
<rdar://problem/7719814>

llvm-svn: 147593
2012-01-05 02:13:20 +00:00
Evan Cheng 104dbb0fd1 For x86, canonicalize max
(x > y) ? x : y
=>
(x >= y) ? x : y

So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0

This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl   %esi, %edi
testl  %edi, %edi
movl   $0, %eax
cmovgl %edi, %eax
=>
xorl   %eax, %eax
subl   %esi, $edi
cmovsl %eax, %edi

rdar://10633221

llvm-svn: 147512
2012-01-04 01:41:39 +00:00
Chad Rosier 6ca97df951 Fix 80-column violations.
llvm-svn: 147495
2012-01-03 23:19:12 +00:00
Nadav Rotem 6d31bac85e Revert 147426 because it caused pr11696.
llvm-svn: 147485
2012-01-03 22:19:42 +00:00
Chad Rosier 493c1b3152 Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
rdar://10594409

llvm-svn: 147481
2012-01-03 21:05:52 +00:00
Craig Topper 5bacb7e9e5 Miscellaneous shuffle lowering cleanup. No functional changes. Primarily converting the indexing loops to unsigned to be consistent across functions.
llvm-svn: 147430
2012-01-02 09:17:37 +00:00
Craig Topper 53d559641f Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection.
llvm-svn: 147428
2012-01-02 08:46:48 +00:00
Nadav Rotem 6c7a0e6c8b Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit.
llvm-svn: 147426
2012-01-02 08:05:46 +00:00
Craig Topper 6e54ba7eee Merge X86 SHUFPS and SHUFPD node types.
llvm-svn: 147394
2011-12-31 23:50:21 +00:00
Craig Topper 0fdf720ded Make LowerBUILD_VECTOR keep node vector types consistent when creating MOVL for v16i16 and v32i8.
llvm-svn: 147337
2011-12-29 03:34:54 +00:00
Craig Topper 862c9b65be Remove some elses after returns.
llvm-svn: 147336
2011-12-29 03:20:51 +00:00
Craig Topper 274e20a499 Remove trailing spaces. Fix an assert to use && instead of || before string. Add same assert on similar code path.
llvm-svn: 147335
2011-12-29 03:09:33 +00:00
Eli Friedman 3a01ddb7e9 Fix type-checking for load transformation which is not legal on floating-point types. PR11674.
llvm-svn: 147323
2011-12-28 21:24:44 +00:00
Elena Demikhovsky b3515a8d4b Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR.
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.

llvm-svn: 147308
2011-12-28 08:14:01 +00:00
Craig Topper df34d152bd Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for consistency. Add comments and an assert for BMI instructions to PerformXorCombine since the enabling of the combine is conditional on it, but the function itself isn't.
llvm-svn: 147287
2011-12-27 06:27:23 +00:00
Chandler Carruth a3d54fe0ae Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

llvm-svn: 147251
2011-12-24 12:12:34 +00:00
Chandler Carruth 38ce24455d Add systematic testing for cttz as well, and fix the bug I spotted by
inspection earlier.

llvm-svn: 147250
2011-12-24 11:46:10 +00:00
Chandler Carruth c9fcde2347 Expand more when we have a nice 'tzcnt' instruction, to avoid generating
'bsf' instructions here.

This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.

llvm-svn: 147246
2011-12-24 11:11:38 +00:00
Chandler Carruth 7e9453e916 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244
2011-12-24 10:55:54 +00:00
Chad Rosier 00bbedff03 Fix 80-column violations.
llvm-svn: 147192
2011-12-22 22:35:21 +00:00
Chad Rosier 3ede414127 No case stmt for BUILD_VECTOR in PerformDAGCombine(), so I assume this isn't
necessary.  Please chime in if I'm mistaken.

llvm-svn: 147065
2011-12-21 19:14:52 +00:00
Chandler Carruth 24680c24d8 Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974
2011-12-20 11:19:37 +00:00
Benjamin Kramer 1b54835a10 Another variadics tweak.
llvm-svn: 146852
2011-12-18 20:51:31 +00:00
Benjamin Kramer 530b820500 Use the fancy new VariadicFunction template instead of a plain variadic function.
Some compilers were complaining about passing StringRef to it.

llvm-svn: 146850
2011-12-18 19:59:20 +00:00
Craig Topper a913dde0ef Remove an unused X86ISD node type.
llvm-svn: 146833
2011-12-17 19:16:44 +00:00
Benjamin Kramer 792edd3c75 X86: Factor the bswap asm matching to be slightly less horrible to read.
llvm-svn: 146831
2011-12-17 14:36:05 +00:00
Lang Hames da07b3ad42 Make sure that the lower bits on the VSELECT condition are properly set.
llvm-svn: 146800
2011-12-17 01:08:46 +00:00
Craig Topper a4d411cb1b Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes.
llvm-svn: 146726
2011-12-16 08:06:31 +00:00
Chad Rosier 75ed9dcbc6 Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

llvm-svn: 146684
2011-12-15 21:34:44 +00:00
Lang Hames c44b5e469b Fix VSELECT operand order. Was previously backwards, causing bogus vector shift results - <rdar://problem/10559581>.
llvm-svn: 146671
2011-12-15 18:57:27 +00:00
Chad Rosier b7a0b89ff0 Use SmallVector/assign(), rather than std::vector/push_back().
llvm-svn: 146627
2011-12-15 01:16:09 +00:00
Chad Rosier 1940baa76b Add support for lowering fneg when AVX is enabled.
rdar://10566486

llvm-svn: 146625
2011-12-15 01:02:25 +00:00
Chandler Carruth 637cc6a8aa Initial CodeGen support for CTTZ/CTLZ where a zero input produces an
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.

Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.

Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.

llvm-svn: 146466
2011-12-13 01:56:10 +00:00
Craig Topper 1fdfec63a4 Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast.
llvm-svn: 146344
2011-12-11 19:12:35 +00:00
Benjamin Kramer 16bbfbec66 X86: Add patterns for the various rounding ops for SSE4.1 and AVX.
llvm-svn: 146257
2011-12-09 15:44:03 +00:00
Owen Anderson 57a7f41d5d Don't explicitly marked libm rounding ops as legal on SSE4.1/AVX. There don't seem to be patterns for these, so I don't know why they were marked legal in the first place.
Fixes failures caused by r146171.

llvm-svn: 146180
2011-12-08 20:51:38 +00:00
Owen Anderson 0b9b9da6c8 Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise.
llvm-svn: 146171
2011-12-08 19:32:14 +00:00
Craig Topper 83320e03e6 Add X86ISD::HADD/HSUB to getTargetNodeName
llvm-svn: 145929
2011-12-06 09:31:36 +00:00
Craig Topper 8d4ba198d6 Merge floating point and integer UNPCK X86ISD node types.
llvm-svn: 145926
2011-12-06 08:21:25 +00:00
Craig Topper 3cb802c775 Clean up some of the shuffle decoding code for UNPCK instructions. Add instruction commenting for AVX/AVX2 forms for integer UNPCKs.
llvm-svn: 145924
2011-12-06 05:31:16 +00:00
Craig Topper bf41eb3a98 Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted.
llvm-svn: 145921
2011-12-06 04:59:07 +00:00
Jakob Stoklund Olesen 10e1252269 Use logarithmic units for basic block alignment.
This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly
documented as taking log2(bytes) units, but the x86 target would still
set a preferred loop alignment of '16'.

CodePlacementOpt passed this number on to the basic block, and
AsmPrinter interpreted it as bytes.

Now both MachineFunction and MachineBasicBlock use logarithmic
alignments.

Obviously, MachineConstantPool still measures alignments in bytes, so we
can emulate the thrill of using as.

llvm-svn: 145889
2011-12-06 01:26:19 +00:00
Craig Topper 51bec1a37a Remove some leftover remnants that once tried to create 64-bit MMX PALIGNR instructions.
llvm-svn: 145804
2011-12-05 07:27:14 +00:00
Craig Topper 6a55b1dd9f Clean up and optimizations to the X86 shuffle lowering code. No functional change.
llvm-svn: 145803
2011-12-05 06:56:46 +00:00
Nick Lewycky 50f02cb21b Move global variables in TargetMachine into new TargetOptions class. As an API
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.

One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.

llvm-svn: 145714
2011-12-02 22:16:29 +00:00
Craig Topper b67440367f Reduce duplicate code in isHorizontalBinOp and add some asserts to protect assumptions
llvm-svn: 145681
2011-12-02 08:18:41 +00:00
Craig Topper abeb79eee3 Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors.
llvm-svn: 145680
2011-12-02 07:16:01 +00:00
Nadav Rotem 96923cc2bb X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479).
llvm-svn: 145488
2011-11-30 10:13:37 +00:00
Craig Topper c4977ba413 Add instruction selection support for AVX2 horizontal add/sub instructions.
llvm-svn: 145487
2011-11-30 09:10:50 +00:00
Craig Topper 0a672eaf9e Merge VPERM2F128/VPERM2I128 ISD node types.
llvm-svn: 145485
2011-11-30 07:47:51 +00:00
Craig Topper bafd224c8b Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128.
llvm-svn: 145483
2011-11-30 06:25:25 +00:00
Craig Topper c16db840be Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
llvm-svn: 145390
2011-11-29 07:49:05 +00:00
Craig Topper 818a983e93 Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar.
llvm-svn: 145238
2011-11-28 10:14:51 +00:00
Craig Topper b0456936da Make isCommutedVSHUFP more like the way isCommutedSHUFP is handled.
llvm-svn: 145218
2011-11-28 01:14:24 +00:00
Craig Topper 79ee88a511 Merge detecting and handling for VSHUFPSY and VSHUFPDY since a lot of the code was similar for both.
llvm-svn: 145199
2011-11-27 21:41:12 +00:00
Craig Topper 51280d565b Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created.
llvm-svn: 145153
2011-11-26 22:55:48 +00:00
Craig Topper 7704bd7ac3 Collapse X86ISD node types for PUNPCKH*, PUNPCKL*, UNPCKLP*, and UNPCKHP* to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type.
llvm-svn: 145148
2011-11-26 20:47:44 +00:00
Craig Topper d65a444478 Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
llvm-svn: 145126
2011-11-24 22:57:10 +00:00
Craig Topper d26466748b Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish.
llvm-svn: 145125
2011-11-24 22:20:08 +00:00
Benjamin Kramer ebcb451874 X86: Use btq for bit tests if the immediate can't be encoded in 32 bits.
Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

llvm-svn: 145103
2011-11-23 13:54:17 +00:00
Elena Demikhovsky 779ba6d7b7 I added several lines in X86 code generator that allow to choose
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.

llvm-svn: 145099
2011-11-23 10:23:16 +00:00
Craig Topper ccb7097509 Fix shuffle decoding logic to handle UNPCKLPS/UNPCKLPD on 256-bit vectors correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms.
llvm-svn: 145055
2011-11-22 01:57:35 +00:00
Craig Topper f563977795 Add methods for querying minimum SSE version along with AVX. Simplifies all the places that had to check a version of SSE and AVX.
llvm-svn: 145053
2011-11-22 00:44:41 +00:00
Craig Topper 6270d072c5 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
llvm-svn: 145028
2011-11-21 08:26:50 +00:00
Craig Topper 669199ca94 Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled.
llvm-svn: 145026
2011-11-21 06:57:39 +00:00
Craig Topper a065238c6e Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
llvm-svn: 145022
2011-11-21 01:12:36 +00:00
Craig Topper e79761df73 Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
llvm-svn: 145005
2011-11-20 00:12:05 +00:00
Craig Topper a3a6583694 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
llvm-svn: 145004
2011-11-19 22:34:59 +00:00
Craig Topper 3af6ae089f Custom lower AVX2 variable shift intrinsics to shl/srl/sra nodes and remove the intrinsic patterns.
llvm-svn: 144999
2011-11-19 17:46:46 +00:00
Craig Topper f984efbfce Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors.
llvm-svn: 144989
2011-11-19 09:02:40 +00:00
Craig Topper 81390be00f Collapse X86 PSIGNB/PSIGNW/PSIGND node types.
llvm-svn: 144988
2011-11-19 07:33:10 +00:00
Craig Topper de6b73bb4d Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
llvm-svn: 144987
2011-11-19 07:07:26 +00:00
Nadav Rotem 1ec141d0f9 Add AVX2 vpbroadcast support
llvm-svn: 144967
2011-11-18 02:49:55 +00:00
Nadav Rotem 37010002f2 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
llvm-svn: 144720
2011-11-15 22:50:37 +00:00
Pete Cooper 7c7ba1baa1 Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
2011-11-15 21:57:53 +00:00
Jay Foad 0745e645e0 Remove some unnecessary includes of PseudoSourceValue.h.
llvm-svn: 144631
2011-11-15 07:24:32 +00:00
Pete Cooper 890e02e854 Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered
Constant idx case is still done in tablegen but other cases are then expanded

Fixes <rdar://problem/10435460>

llvm-svn: 144557
2011-11-14 19:38:42 +00:00
Craig Topper a331515c82 Add neverHasSideEffects, mayLoad, and mayStore to many patternless SSE/AVX instructions. Remove MMX check from LowerVECTOR_SHUFFLE since MMX vector types won't go through it anyway.
llvm-svn: 144522
2011-11-14 06:46:21 +00:00
Craig Topper b8bcb473e2 Add BLSI, BLSMSK, and BLSR to getTargetNodeName.
llvm-svn: 144502
2011-11-13 17:31:07 +00:00
Craig Topper 3dc75f9e3b Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code.
llvm-svn: 144457
2011-11-12 09:58:49 +00:00
Craig Topper ea28a34c43 Add lowering for AVX2 shift instructions.
llvm-svn: 144380
2011-11-11 07:39:23 +00:00
Nadav Rotem 1938482bfa AVX2: Add patterns for variable shift operations
llvm-svn: 144212
2011-11-09 21:22:13 +00:00
Nadav Rotem 79135d844d Add AVX2 support for vselect of v32i8
llvm-svn: 144187
2011-11-09 13:21:28 +00:00
Craig Topper c9eb09d3b8 Add instruction selection for AVX2 integer comparisons.
llvm-svn: 144176
2011-11-09 08:06:13 +00:00
Craig Topper 8c8a431057 Add AVX2 instruction lowering for add, sub, and mul.
llvm-svn: 144174
2011-11-09 07:28:55 +00:00
Pete Cooper 82cd9e81fc Added invariant field to the DAG.getLoad method and changed all calls.
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses

llvm-svn: 144100
2011-11-08 18:42:53 +00:00
Evan Cheng 91b56e0390 Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222
llvm-svn: 144052
2011-11-08 00:31:58 +00:00
Dan Gohman 198b7ffc11 Reapply r143206, with fixes. Disallow physical register lifetimes
across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660
2011-11-03 21:49:52 +00:00
Eli Friedman 3f5eccbe7a Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289.
llvm-svn: 143498
2011-11-01 21:18:39 +00:00
Benjamin Kramer 7402ee6ec2 X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in.
llvm-svn: 143315
2011-10-30 17:31:21 +00:00