Jakub Staszak
da03f3ba64
Remove unneeded casts. No functionality change.
...
llvm-svn: 155800
2012-04-29 20:52:53 +00:00
Craig Topper
3b94fa63d6
Simplify code a bit. No functional change intended.
...
llvm-svn: 155798
2012-04-29 20:22:05 +00:00
Craig Topper
0fa6c7e593
Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements.
...
llvm-svn: 155742
2012-04-27 22:54:43 +00:00
Chad Rosier
32c2178ef3
Add x86-specific DAG combine to simplify:
...
x == -y --> x+y == 0
x != -y --> x+y != 0
On x86, the generated code goes from
negl %esi
cmpl %esi, %edi
je .LBB0_2
to
addl %esi, %edi
je .L4
This case is correctly handled for ARM with "cmn".
Patch by Manman Ren.
rdar://11245199
PR12545
llvm-svn: 155739
2012-04-27 22:33:25 +00:00
Craig Topper
42cd8d2c00
Tidy up spacing.
...
llvm-svn: 155733
2012-04-27 21:05:09 +00:00
Benjamin Kramer
913da4b261
X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
...
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.
Fixes PR6679. Patch by Christoph Erhardt!
llvm-svn: 155704
2012-04-27 12:07:43 +00:00
Craig Topper
5ff6dc34b9
Use vector_shuffles instead of target specific unpack nodes for AVX ZERO_EXTEND/ANY_EXTEND combine. These will be converted to target specific nodes during lowering. This is more consistent with other code.
...
llvm-svn: 155537
2012-04-25 06:39:39 +00:00
Nadav Rotem
7b7b99c74a
AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8
...
immediate. We can't use it here because the shuffle code does not check that
the lower part of the word is identical to the upper part.
llvm-svn: 155440
2012-04-24 11:27:53 +00:00
Craig Topper
0b65c40821
Remove dangling spaces. Fix some other formatting.
...
llvm-svn: 155429
2012-04-24 06:36:35 +00:00
Craig Topper
6f2a535de2
Simplify code a bit and make it compile better. Remove unused parameters.
...
llvm-svn: 155428
2012-04-24 06:02:29 +00:00
Nadav Rotem
3f8acfc3c4
Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics).
...
llvm-svn: 155397
2012-04-23 21:53:37 +00:00
Craig Topper
153bb34a3c
Use MVT instead of EVT through all of LowerVECTOR_SHUFFLEtoBlend and not just the switch. Saves a little bit of binary size.
...
llvm-svn: 155339
2012-04-23 07:36:33 +00:00
Craig Topper
0a2c809d09
Make getZeroVector and getOnesVector more alike as far as how they detect 128-bit versus 256-bit vectors. Be explicit about both sizes and use llvm_unreachable. Similar changes to getLegalSplat.
...
llvm-svn: 155337
2012-04-23 07:24:41 +00:00
Craig Topper
2bbe8bcf4e
Tidy up by removing some 'else' after 'return'
...
llvm-svn: 155336
2012-04-23 06:57:04 +00:00
Craig Topper
5c51eeecfc
Tidy up spacing in LowerVECTOR_SHUFFLEtoBlend. Remove code that checks if shuffle operand has a different type than the the shuffle result since it can never happen.
...
llvm-svn: 155333
2012-04-23 06:38:28 +00:00
Craig Topper
a52f0d09b6
Add a couple llvm_unreachables.
...
llvm-svn: 155332
2012-04-23 03:42:40 +00:00
Craig Topper
984dc015ae
Remove some tab characers.
...
llvm-svn: 155331
2012-04-23 03:28:34 +00:00
Craig Topper
ea428fd79c
Remove some 'else' after 'return'. No functional change.
...
llvm-svn: 155330
2012-04-23 03:26:18 +00:00
Craig Topper
bf7d5666f0
Make Extract128BitVector and Insert128BitVector take an unsigned instead of an ConstantNode SDValue. getConstant was almost always called just before only to have the functions take it apart and build a new ConstantSDNode.
...
llvm-svn: 155325
2012-04-22 20:55:18 +00:00
Craig Topper
2d474d6d92
Convert getNode(UNDEF) to getUNDEF.
...
llvm-svn: 155321
2012-04-22 19:29:34 +00:00
Craig Topper
860ed0d20a
Make calls to getVectorShuffle more consistent. Use shuffle VT for calls to getUNDEF instead of requerying. Use &Mask[0] instead of Mask.data().
...
llvm-svn: 155320
2012-04-22 19:17:57 +00:00
Craig Topper
43397c0900
Tidy up. 80 columns and argument alignment.
...
llvm-svn: 155319
2012-04-22 18:51:37 +00:00
Craig Topper
ad56a744f1
Simplify code by converting multiple places that were manually concatenating 128-bit vectors to use either CONCAT_VECTORS or a helper function. CONCAT_VECTORS will itself be lowered to the same pattern as before. The helper function is needed for concats of BUILD_VECTORs since getNode(CONCAT_VECTORS) will just return a large BUILD_VECTOR and we may be trying to lower large BUILD_VECTORS when this occurs.
...
llvm-svn: 155318
2012-04-22 18:15:59 +00:00
Elena Demikhovsky
8d7e56c409
ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
...
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Craig Topper
6eadae8e60
Make some fixed arrays const. Use array_lengthof in a couple places instead of a hardcoded number.
...
llvm-svn: 155294
2012-04-21 18:58:38 +00:00
Craig Topper
2568bf3089
Tidy up. 80 columns and some other spacing issues.
...
llvm-svn: 155291
2012-04-21 18:13:35 +00:00
Craig Topper
abadc660e0
Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent.
...
llvm-svn: 155186
2012-04-20 06:31:50 +00:00
Craig Topper
d3c9e404ba
Remove AVX vpermil intrinsics. I removed their uses from clang headers and builtins a while back.
...
llvm-svn: 154985
2012-04-18 05:24:00 +00:00
Craig Topper
354103d8ca
Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the immediate is set.
...
llvm-svn: 154907
2012-04-17 05:54:54 +00:00
Richard Smith
12da79b859
Fix incorrect atomics codegen introduced in r154705, and extend test to catch it.
...
llvm-svn: 154845
2012-04-16 18:43:53 +00:00
Craig Topper
4badeb3f0d
Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes.
...
llvm-svn: 154801
2012-04-16 07:13:00 +00:00
Craig Topper
26d7a94981
Change type profile for vpermv back to using operand type for the mask argument to match intrinsic behavior. Add a bitcast to the lowering code to convert mask from v8i32 to v8f32 for vpermps.
...
llvm-svn: 154798
2012-04-16 06:43:40 +00:00
Craig Topper
b86fa404d3
Merge vpermps/vpermd and vpermpd/vpermq SD nodes.
...
llvm-svn: 154782
2012-04-16 00:41:45 +00:00
Craig Topper
1f8c9eb925
Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits.
...
llvm-svn: 154780
2012-04-15 23:48:57 +00:00
Elena Demikhovsky
779a72b49e
Added VPERM optimization for AVX2 shuffles
...
llvm-svn: 154761
2012-04-15 11:18:59 +00:00
Richard Smith
3e8f1f6aea
Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y.
...
llvm-svn: 154705
2012-04-13 22:47:00 +00:00
Nadav Rotem
372cf15125
remove unused argument
...
llvm-svn: 154494
2012-04-11 11:05:21 +00:00
Nadav Rotem
9bc178ac5c
Reapply 154396 after fixing a test.
...
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.
llvm-svn: 154483
2012-04-11 06:40:27 +00:00
Chad Rosier
f7345b027a
Whitespace.
...
llvm-svn: 154427
2012-04-10 19:42:07 +00:00
Chad Rosier
235a7a1746
Revert r154396, which looks to be the real culprit behind the bot failures.
...
llvm-svn: 154426
2012-04-10 19:39:18 +00:00
Eric Christopher
65ada95b84
Temporarily revert this patch to see if it brings the buildbots back.
...
llvm-svn: 154425
2012-04-10 19:33:16 +00:00
David Blaikie
2735136655
Remove unused variable.
...
llvm-svn: 154398
2012-04-10 15:23:13 +00:00
Nadav Rotem
f934f91709
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
...
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.
llvm-svn: 154396
2012-04-10 14:33:13 +00:00
Evan Cheng
f8bad08001
Fix a long standing tail call optimization bug. When a libcall is emitted
...
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Nadav Rotem
fb7e2ae53c
Lower some x86 shuffle sequences to the vblend family of instructions.
...
llvm-svn: 154313
2012-04-09 08:33:21 +00:00
Nadav Rotem
b801ca3976
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type.
...
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.
llvm-svn: 154310
2012-04-09 07:45:58 +00:00
Chandler Carruth
16f0ebcbb5
Move the TLSModel information into the TargetMachine rather than hiding
...
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
2012-04-08 17:20:55 +00:00
Nadav Rotem
82609df647
AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
...
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.
llvm-svn: 154284
2012-04-08 12:54:54 +00:00
Benjamin Kramer
3cacabfb04
Fix narrowing conversion.
...
llvm-svn: 154171
2012-04-06 13:33:52 +00:00
Craig Topper
447417c932
Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.
...
llvm-svn: 154166
2012-04-06 07:45:23 +00:00
Rafael Espindola
ba0a6cabb8
Always compute all the bits in ComputeMaskedBits.
...
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Nadav Rotem
b078350872
This commit contains a few changes that had to go in together.
...
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
(and also scalar_to_vector).
2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))
3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y).
4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.
Code which was previously compiled to this:
movd (%rsi), %xmm0
movdqa .LCPI0_0(%rip), %xmm2
pshufb %xmm2, %xmm0
movd (%rdi), %xmm1
pshufb %xmm2, %xmm1
pxor %xmm0, %xmm1
pshufb .LCPI0_1(%rip), %xmm1
movd %xmm1, (%rdi)
ret
Now compiles to this:
movl (%rsi), %eax
xorl %eax, (%rdi)
ret
llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Craig Topper
9cfc69c779
Spacing fixes and using 'unsigned' instead of 'int' to index to select shuffle elements for consistency with other shuffle code in X86 backend.
...
llvm-svn: 153154
2012-03-21 02:14:01 +00:00
Craig Topper
b34d96c614
Remove code that prevented lowering shuffles if they are used by load and themselves used by a extract_vector_elt. This was done to allow the DAG combiner to collapse to a single element load. Unfortunately, sometimes the extract_vector_elt would disappear before DAG combine could do the transformation leaving a vector_shuffle that isel couldn't handle. New code lets the shuffle be converted to a target specific node, but then adds a combine routine that can convert target specific nodes back to vector_shuffles if the folding criteria are met.
...
llvm-svn: 153080
2012-03-20 07:17:59 +00:00
Craig Topper
cbc96a6e90
Factor out target shuffle mask decoding from getShuffleScalarElt and use a SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change.
...
llvm-svn: 153079
2012-03-20 06:42:26 +00:00
Craig Topper
129f9ef669
isCommutedMOVLMask should only look at 128-bit vectors to match isMOVLMask.
...
llvm-svn: 153027
2012-03-18 22:50:10 +00:00
Craig Topper
bef78fc2ee
Convert more static tables of registers used by calling convention to uint16_t to reduce space.
...
llvm-svn: 152538
2012-03-11 07:57:25 +00:00
Chad Rosier
9424aa1c51
Address Evan's comments for r151877.
...
Specifically, remove the magic number when checking to see if the copy has a
glue operand and simplify the checking logic.
rdar://10930395
llvm-svn: 152041
2012-03-05 19:27:12 +00:00
Chad Rosier
f5e086f18e
Prevent obscure and incorrect tail-call optimization.
...
In this instance we are generating the tail-call during legalizeDAG. The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call. The first floor call can't be a tail-call because it's
not in the tail position. The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.
rdar://10930395
llvm-svn: 151877
2012-03-02 02:50:46 +00:00
Evan Cheng
65f9d19c4f
Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
...
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Daniel Dunbar
ee7b899343
Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
...
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng
87c7b09d8d
Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
...
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
2012-02-28 06:42:03 +00:00
NAKAMURA Takumi
bdf94879df
Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff.
...
[Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
Clang raised a warning, and X86 LowerOperation would assert out for
fptoui f64 to i32 because it improperly lowered to an illegal
BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
any other changes are necessary. Thanks.
llvm-svn: 151432
2012-02-25 03:37:25 +00:00
Michael J. Spencer
248d65e78b
Add WIN_FTOL_* psudo-instructions to model the unique calling convention
...
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!
llvm-svn: 151382
2012-02-24 19:01:22 +00:00
Craig Topper
760b134ffa
Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
...
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Craig Topper
de121a1000
Remove some unneeded includes and fix ordering in X86ISelLowering.cpp. Remove unneeded 'using namespace'.
...
llvm-svn: 150916
2012-02-19 07:15:48 +00:00
Craig Topper
65a4ceea1e
Unify all shuffle mask checking functions take a mask and VT instead of VectorShuffleSDNode.
...
llvm-svn: 150913
2012-02-19 05:41:45 +00:00
Craig Topper
3e5c04e432
Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel.
...
llvm-svn: 150908
2012-02-19 02:53:47 +00:00
Jakob Stoklund Olesen
97e3115dc2
Use the same CALL instructions for Windows as for everything else.
...
The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.
llvm-svn: 150708
2012-02-16 17:56:02 +00:00
Jakob Stoklund Olesen
8a450cb2fa
Enable register mask operands for x86 calls.
...
Call instructions no longer have a list of 43 call-clobbered registers.
Instead, they get a single register mask operand with a bit vector of
call-preserved registers.
This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call
instruction, and it speeds up building call instructions because those
43 imp-def operands no longer need to be added to use-def lists. (And
removed and shifted and re-added for every explicit call operand).
Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and
BranchFolding are significantly faster because they can deal with call
clobbers in bulk.
Overall, clang -O2 is between 0% and 8% faster, uniformly distributed
depending on call density in the compiled code. Debug builds using
clang -O0 are 0% - 3% faster.
I have verified that this patch doesn't change the assembly generated
for the LLVM nightly test suite when building with -disable-copyprop
and -disable-branch-fold.
Branch folding behaves slightly differently in a few cases because call
instructions have different hash values now.
Copy propagation flushes its data structures when it crosses a register
mask operand. This causes it to leave a few dead copies behind, on the
order of 20 instruction across the entire nightly test suite, including
SPEC. Fixing this properly would require the pass to use different data
structures.
llvm-svn: 150638
2012-02-16 00:02:50 +00:00
Craig Topper
87119fa37f
Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel.
...
llvm-svn: 150360
2012-02-13 04:30:38 +00:00
Anton Korobeynikov
c6b4017ce2
Add support for implicit TLS model used with MS VC runtime.
...
Patch by Kai Nacke!
llvm-svn: 150307
2012-02-11 17:26:53 +00:00
Craig Topper
11826a6e10
Fix shuffle lowering code to stop creating temporary DAG nodes to do shuffle mask checks on. This seemed to be confusing things such that vector_shuffle ops to got through to iselection. This is another step towards removing the vector_shuffle handling patterns from isel.
...
llvm-svn: 150296
2012-02-11 06:24:48 +00:00
Elena Demikhovsky
1adc1d53dd
Fixed a bug in printing "cmp" pseudo ops.
...
> This IR code
> %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14)
> fails with assertion:
>
> llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed.
> 0 llc 0x0000000001355803
> 1 llc 0x0000000001355dc9
> 2 libpthread.so.0 0x00007f79a30575d0
> 3 libc.so.6 0x00007f79a23a1945 gsignal + 53
> 4 libc.so.6 0x00007f79a23a2f21 abort + 385
> 5 libc.so.6 0x00007f79a239a810 __assert_fail + 240
> 6 llc 0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const*, unsigned int, llvm::raw_ostream&) + 119
I added the full testing for all possible pseudo-ops of cmp.
I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp.
You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in.
llvm-svn: 150068
2012-02-08 08:37:26 +00:00
Craig Topper
5405571fe0
Remove GCC builtins for vpermilp* intrinsics as clang no longer needs them. Custom lower the intrinsics to the vpermilp target specific node and remove intrinsic patterns.
...
llvm-svn: 150060
2012-02-08 06:36:57 +00:00
Craig Topper
b27fd77c3f
Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD.
...
llvm-svn: 149968
2012-02-07 06:28:42 +00:00
Chris Lattner
8213c8af29
Remove some dead code and tidy things up now that vectors use ConstantDataVector
...
instead of always using ConstantVector.
llvm-svn: 149912
2012-02-06 21:56:39 +00:00
Benjamin Kramer
2496717052
X86: Don't call malloc for 4 bits. No functionality change.
...
llvm-svn: 149866
2012-02-06 12:06:18 +00:00
Craig Topper
1f71057747
Add shuffle decoding support for 256-bit pshufd. Merge vpermilp* and pshufd decoding.
...
llvm-svn: 149859
2012-02-06 07:17:51 +00:00
Duncan Sands
ae22c60f90
Persuade GCC that there is nothing worth warning about here (there isn't).
...
llvm-svn: 149834
2012-02-05 14:20:11 +00:00
Craig Topper
4ed7278ff4
Convert assert(0) to llvm_unreachable in X86 Target directory.
...
llvm-svn: 149809
2012-02-05 05:38:58 +00:00
Craig Topper
83f3bdaa45
Convert some assert(0) in default of switch statements to llvm_unreachable.
...
llvm-svn: 149808
2012-02-05 03:43:23 +00:00
Craig Topper
1d471e31ba
Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies.
...
llvm-svn: 149807
2012-02-05 03:14:49 +00:00
Craig Topper
47e6d26911
Remove getShuffleVPERMILPImmediate function, getShuffleSHUFImmediate performs the same calculation.
...
llvm-svn: 149683
2012-02-03 06:52:33 +00:00
Craig Topper
d5ffe0900d
Remove unnecessary qualification on 256-bit vector handling in LowerBUILD_VECTOR. Condition was already guaranteed by earlier code.
...
llvm-svn: 149680
2012-02-03 06:32:21 +00:00
Lang Hames
bb682450f9
Incorporate suggestions Chad, Jakob and Evan's suggestions on r149957.
...
llvm-svn: 149655
2012-02-03 01:13:49 +00:00
Jakob Stoklund Olesen
5e1ac45b93
Require non-NULL register masks.
...
It doesn't seem worthwhile to give meaning to a NULL register mask
pointer. It complicates all the code using register mask operands.
llvm-svn: 149646
2012-02-02 23:52:57 +00:00
Elena Demikhovsky
6fbb4d2842
Minor change in signature of the getZeroVector()
...
llvm-svn: 149601
2012-02-02 09:20:18 +00:00
Elena Demikhovsky
fb44980b41
Optimization for SIGN_EXTEND operation on AVX.
...
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.
llvm-svn: 149600
2012-02-02 09:10:43 +00:00
Francois Pichet
26f302d568
Unbreak the MSVC build.
...
llvm-svn: 149599
2012-02-02 08:36:09 +00:00
Lang Hames
0269caafa6
Set EFLAGS correctly in EmitLoweredSelect on X86.
...
llvm-svn: 149597
2012-02-02 07:48:37 +00:00
Andrew Trick
8523b16ff5
Instruction scheduling itinerary for Intel Atom.
...
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.
Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.
Adds a test to verify that the scheduler is working.
Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.
Patch by Preston Gurd!
llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Mon P Wang
9f05206659
Avoid creating an extract element to an illegal type after LegalizeTypes has run.
...
llvm-svn: 149548
2012-02-01 22:15:20 +00:00
Chad Rosier
e273cb08c4
Tidy up.
...
llvm-svn: 149521
2012-02-01 18:45:51 +00:00
Elena Demikhovsky
34cca175ab
Shortened code in shuffle masks
...
llvm-svn: 149493
2012-02-01 10:33:05 +00:00
Elena Demikhovsky
0e48c70ba7
Optimization for "truncate" operation on AVX.
...
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.
llvm-svn: 149485
2012-02-01 07:56:44 +00:00
Craig Topper
9cdb8bdf04
Don't create VBROADCAST nodes if any nodes use the chain result from the load. Fixes PR11900.
...
llvm-svn: 149478
2012-02-01 06:51:58 +00:00
Craig Topper
b85e40f738
Remove pcmpgt/pcmpeq intrinsics as clang is not using them.
...
llvm-svn: 149367
2012-01-31 06:52:44 +00:00
Benjamin Kramer
396c590818
Fix refacto.
...
llvm-svn: 149269
2012-01-30 20:01:35 +00:00
Douglas Gregor
e577cfe172
Eliminate narrowing conversion in initializer list, to make C++11 happy
...
llvm-svn: 149254
2012-01-30 16:57:18 +00:00
Benjamin Kramer
20af25f47b
X86: Simplify shuffle mask generation code.
...
llvm-svn: 149248
2012-01-30 15:16:21 +00:00
Craig Topper
516cba3380
Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary.
...
llvm-svn: 149232
2012-01-30 07:50:31 +00:00
Craig Topper
ca29bcfc10
Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes.
...
llvm-svn: 149216
2012-01-30 01:10:15 +00:00
Craig Topper
b91760eff8
Remove some more patterns by custom lowering intrinsics to target specific nodes.
...
llvm-svn: 149052
2012-01-26 07:18:03 +00:00
Chris Lattner
33633a90a0
fix a bug I introduced in r148929, this is not a splat!
...
Thanks to Eli for noticing.
llvm-svn: 148947
2012-01-25 09:56:22 +00:00
Craig Topper
7834900950
Custom lower PSIGN and PSHUFB intrinsics to their corresponding target specific nodes so we can remove the isel patterns.
...
llvm-svn: 148933
2012-01-25 06:43:11 +00:00
Chris Lattner
47a86bdbe2
use ConstantVector::getSplat in a few places.
...
llvm-svn: 148929
2012-01-25 06:02:56 +00:00
Craig Topper
ce4f9c5668
Custom lower phadd and phsub intrinsics to target specific nodes. Remove the patterns that are no longer necessary.
...
llvm-svn: 148927
2012-01-25 05:37:32 +00:00
Elena Demikhovsky
0b0c5d8c4c
ZERO_EXTEND operation is optimized for AVX.
...
v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions.
llvm-svn: 148803
2012-01-24 13:54:13 +00:00
Craig Topper
edd1d0acfc
Custom lower PCMPEQ/PCMPGT intrinsics to target specific nodes and remove the intrinsic patterns.
...
llvm-svn: 148687
2012-01-23 08:18:28 +00:00
Craig Topper
6b90c5d03e
Update more places to use target specific nodes for vector shifts instead of intrinsics.
...
llvm-svn: 148685
2012-01-23 06:46:22 +00:00
Craig Topper
5e80db4e4f
Custom lower vector shift intrinsics to target specific nodes and remove the patterns that are no longer needed.
...
llvm-svn: 148684
2012-01-23 06:16:53 +00:00
Craig Topper
0b7ad76bd0
Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching.
...
llvm-svn: 148670
2012-01-22 23:36:02 +00:00
Craig Topper
bd4884371b
Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching.
...
llvm-svn: 148667
2012-01-22 22:42:16 +00:00
Craig Topper
094626414d
Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead.
...
llvm-svn: 148664
2012-01-22 19:15:14 +00:00
Craig Topper
a4ed5246d8
Make code a little less verbose.
...
llvm-svn: 148651
2012-01-22 03:07:48 +00:00
Craig Topper
cb3433cd58
Remove unused X86 ISD node type defines.
...
llvm-svn: 148644
2012-01-22 01:15:56 +00:00
Craig Topper
39bc1e4d25
Fix PR11819 introduced by r148537. I'd commit the test case, but the generated code is terrible as it gets fully scalarized. Expect a future commit to fix that.
...
llvm-svn: 148632
2012-01-21 08:49:33 +00:00
David Blaikie
46a9f016c5
More dead code removal (using -Wunreachable-code)
...
llvm-svn: 148578
2012-01-20 21:51:11 +00:00
Craig Topper
a409479023
Improve 256-bit shuffle splitting to allow 2 sources in each 128-bit lane. As long as only a single lane of the source is used in the lane in the destination. This makes the splitting match much closer to what happens with 256-bit shuffles when AVX is disabled and only 128-bit XMM is allowed.
...
llvm-svn: 148537
2012-01-20 09:29:03 +00:00
Craig Topper
3469212c82
Add support for selecting 256-bit PALIGNR.
...
llvm-svn: 148532
2012-01-20 05:53:00 +00:00
Eli Friedman
32c7c25dcb
Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff.
...
llvm-svn: 148513
2012-01-20 00:05:46 +00:00
Craig Topper
80576e8d1f
Merge 128-bit and 256-bit SHUFPS/SHUFPD handling.
...
llvm-svn: 148466
2012-01-19 08:19:12 +00:00
Nick Lewycky
ecc0084f72
Add a TargetOption for disabling tail calls.
...
llvm-svn: 148442
2012-01-19 00:34:10 +00:00
Jakob Stoklund Olesen
ff482f733b
Add experimental -x86-use-regmask command line option.
...
It adds register mask operands to x86 call instructions. Once all the
backend passes support register mask operands, this will be permanently
enabled.
llvm-svn: 148438
2012-01-18 23:52:22 +00:00
Nadav Rotem
86c3807b99
Fix warning.
...
llvm-svn: 148301
2012-01-17 09:31:09 +00:00
Nadav Rotem
86e5390dbf
Fix 11769.
...
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.
llvm-svn: 148298
2012-01-17 09:13:19 +00:00
Craig Topper
9cafcd8baa
Remove unnecessary AVX check from an assert. hasSSE2 is enough.
...
llvm-svn: 148295
2012-01-17 08:23:44 +00:00
Craig Topper
37b10ef250
Fix a crasher when PerformShiftCombine receives a BUILD_VECTOR of all UNDEF. Probably could use better handling in DAG combine or getNode. Fixes PR11772.
...
llvm-svn: 148285
2012-01-17 04:44:50 +00:00
Nadav Rotem
57935243bd
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits.
...
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
2012-01-15 19:27:55 +00:00
Benjamin Kramer
339ced4e34
Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen.
...
llvm-svn: 148218
2012-01-15 13:16:05 +00:00
Craig Topper
b1c2ebf6ee
use v8i32 as optimal mem type over v8f32 if AVX2 is enabled. Similar to SSE2 vs SSE1.
...
llvm-svn: 148109
2012-01-13 08:32:21 +00:00
Craig Topper
cb7e13d7c0
Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32.
...
llvm-svn: 148108
2012-01-13 08:12:35 +00:00
Craig Topper
2aa07f832e
Fix typo in PerformAddCombine that caused any vector type to be checked for horizontal add/sub if AVX2 is enabled. This caused an assert to fail for non 128/256-bit vectors when done before type legalizing. Fixes PR11749.
...
llvm-svn: 148096
2012-01-13 05:04:25 +00:00
Elena Demikhovsky
060f6ccdb8
Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure
...
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed.
Added a test.
llvm-svn: 148044
2012-01-12 20:33:10 +00:00
Nadav Rotem
0a0a829bea
Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors.
...
The attached testcase was stuck in an endless loop.
llvm-svn: 148027
2012-01-12 15:31:55 +00:00
Rafael Espindola
6635ae1c17
Explicitly set the scale to 1 on some segstack prologue instrs.
...
Patch by Brian Anderson.
llvm-svn: 147952
2012-01-11 18:14:03 +00:00
Nadav Rotem
baae7e4577
Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
...
llvm-svn: 147948
2012-01-11 14:07:51 +00:00
Lang Hames
995c63329a
Fixed order of operands in comment to match code.
...
llvm-svn: 147890
2012-01-10 22:53:20 +00:00
Bill Wendling
d5ab02600e
For i386, don't use the generic code.
...
As the comment around 7746 says, it's better to use the x87 extended precision
here than SSE. And the generic code doesn't know how to do that. It also regains
the speed lost for the uint64_to_float.c testcase.
<rdar://problem/10669858>
llvm-svn: 147869
2012-01-10 19:41:30 +00:00
Craig Topper
430f3f1bd6
Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
...
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Craig Topper
b0c0f72ae6
Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE.
...
llvm-svn: 147843
2012-01-10 06:54:16 +00:00
Craig Topper
d97bbd7b60
Remove hasSSE*orAVX functions and change all callers to use just hasSSE*. AVX is now an SSE level and no longer disables SSE checks.
...
llvm-svn: 147842
2012-01-10 06:37:29 +00:00
Craig Topper
210e4f81b3
Change some places that were checking for AVX OR SSE1/2 to use hasXMM/hasXMMInt instead. Also fix one place that checked SSE3, but accidentally excluded AVX to use hasSSE3orAVX. This is a step towards removing the AVX hack from the X86Subtarget.h
...
llvm-svn: 147764
2012-01-09 02:28:15 +00:00
Victor Umansky
540651cf59
Reverted commit #147601 upon Evan's request.
...
llvm-svn: 147748
2012-01-08 17:20:33 +00:00
Benjamin Kramer
6898db6269
Remove VectorExtras. This unused helper was written for a type of API that is discouraged now.
...
llvm-svn: 147738
2012-01-07 19:42:13 +00:00
Craig Topper
ca66bba45e
Remove unnecessary check of hasAVX(). It's already included in hasXMM().
...
llvm-svn: 147734
2012-01-07 18:48:43 +00:00
Eric Christopher
c206d46709
Make the 'x' constraint work for AVX registers as well.
...
Fixes rdar://10614894
llvm-svn: 147704
2012-01-07 01:02:09 +00:00
Victor Umansky
9255b6d9fe
Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
...
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)
Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
2012-01-05 08:46:19 +00:00
Bill Wendling
ac27f0c830
Replace the uint64_t -> double convertion algorithm with one that's more efficient.
...
This small bit of ASM code is sufficient to do what the old algorithm did:
movq %rax, %xmm0
punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U }
subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 }
#ifdef __SSE3__
haddpd %xmm0, %xmm0
#else
pshufd $0x4e, %xmm0, %xmm1
addpd %xmm1, %xmm0
#endif
It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on
all processors.
<rdar://problem/7719814>
llvm-svn: 147593
2012-01-05 02:13:20 +00:00