Commit Graph

2285 Commits

Author SHA1 Message Date
Elena Demikhovsky 5f2f06d2d9 Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171467
2013-01-03 08:48:33 +00:00
Hal Finkel 95de3f3018 Add a subtype parameter to VTTI::getShuffleCost
In order to cost subvector insertion and extraction, we need to know
the type of the subvector being extracted.

No functionality change.

llvm-svn: 171453
2013-01-03 02:34:09 +00:00
Nadav Rotem 761937a757 AVX: Fix a bug in WidenMaskArithmetic.
llvm-svn: 171398
2013-01-02 17:41:03 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Bill Wendling 749a43d874 Use the predicate methods off of AttributeSet instead of Attribute.
llvm-svn: 171257
2012-12-30 13:50:49 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Craig Topper fe82eb6bcd Remove intrinsic specific instructions for (V)SQRTPS/PD. Instead lower to target-independent ISD nodes and use the existing patterns for those.
llvm-svn: 171237
2012-12-29 18:18:20 +00:00
Craig Topper f4a9c6e21b Merge similar functionality using a nested switch.
llvm-svn: 171229
2012-12-29 17:19:06 +00:00
Craig Topper 6b27251a76 Remove intrinsic specific instructions for SSE/SSE2/AVX floating point max/min instructions. Lower them to target specific nodes and use those patterns instead. This also allows them to be commuted if UnsafeFPMath is enabled.
llvm-svn: 171227
2012-12-29 16:44:25 +00:00
Jakub Staszak 215f94143c Simplify code, no functionality change.
llvm-svn: 171226
2012-12-29 15:57:26 +00:00
Nadav Rotem 9785f519b4 CostModel: initial checkin for code that estimates the cost of special shuffles.
llvm-svn: 171180
2012-12-28 08:19:03 +00:00
Nadav Rotem c982a2dc25 wrap 80-col lines.
llvm-svn: 171179
2012-12-28 07:28:43 +00:00
Nadav Rotem 3da9ac72fa AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend.
llvm-svn: 171178
2012-12-28 05:45:24 +00:00
Nadav Rotem 68441914a5 Reverse the 'if' condition and reduce the indentation.
llvm-svn: 171172
2012-12-27 23:08:05 +00:00
Nadav Rotem 3b34190100 AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function.
llvm-svn: 171170
2012-12-27 22:47:16 +00:00
Nadav Rotem 2a054b4475 On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized
register. In most cases we actually compare or select YMM-sized registers
and mixing the two types creates horrible code. This commit optimizes
some of the transition sequences.

PR14657.

llvm-svn: 171148
2012-12-27 08:15:45 +00:00
Nadav Rotem 8e5d80eba3 AVX/AVX2: Move the code that lowers vector-trunc from a DAGCo-hook to custom lowering hook.
The vector truncs were scalarized during LegalizeVectorOps, later vectorized again by some DAGCombine optimization
and finally, lowered by a dagcombing optimization. Now, they are properly lowered during LegalizeVectorOps.
No new testcase because the original testcases still work.

llvm-svn: 171146
2012-12-27 07:45:10 +00:00
Nadav Rotem 5267bb71b8 Reformat the docs.
llvm-svn: 171091
2012-12-26 04:59:20 +00:00
Benjamin Kramer 81b5a8fd2e X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity.
llvm-svn: 171064
2012-12-25 13:09:08 +00:00
Benjamin Kramer df4af41b9b X86: Custom lower <2 x i64> eq and ne when SSE41 is not available.
pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack.
Small speedup on loop-vectorized viterbi (-march=core2).

llvm-svn: 171063
2012-12-25 12:54:19 +00:00
Nick Lewycky 521e0d59f3 Quiet gcc's -Wparenthesis warning. No functionality change.
llvm-svn: 171044
2012-12-24 19:58:45 +00:00
Nadav Rotem b15c69a725 whitespace
llvm-svn: 170997
2012-12-23 07:33:44 +00:00
Nadav Rotem 2cade68025 Loop Vectorizer: Update the cost model of scatter/gather operations and make
them more expensive.

llvm-svn: 170995
2012-12-23 07:23:55 +00:00
Benjamin Kramer 76268ac682 X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.
pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

llvm-svn: 170985
2012-12-22 16:07:56 +00:00
Benjamin Kramer b2f0a2bd4b X86: Emit vector sext as shuffle + sra if vpmovsx is not available.
Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better
than scalarized loads. Fixes PR14590.

llvm-svn: 170984
2012-12-22 11:34:28 +00:00
Benjamin Kramer 82d1c371e2 X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization.
Part of PR14667.

llvm-svn: 170908
2012-12-21 17:46:58 +00:00
Benjamin Kramer 4669d18893 X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics
This is very mechanical, no functionality change. Preparation for PR14667.

llvm-svn: 170898
2012-12-21 14:04:55 +00:00
Nadav Rotem 6d4fdd6d2c Improve the X86 cost model for loads and stores.
llvm-svn: 170830
2012-12-21 01:33:59 +00:00
Patrik Hagglund e09cac9a67 Change TargetLowering::getTypeForExtArgOrReturn to take and return
MVTs, instead of EVTs.

llvm-svn: 170537
2012-12-19 12:02:25 +00:00
Patrik Hagglund f9eb168ef4 Change TargetLowering::findRepresentativeClass to take an MVT, instead
of EVT.

llvm-svn: 170532
2012-12-19 11:30:36 +00:00
NAKAMURA Takumi 89209462fe X86ISelLowering.cpp: Fix warnings. [-Wlogical-op-parentheses]
llvm-svn: 170523
2012-12-19 10:12:48 +00:00
Elena Demikhovsky 14a4af0e66 Optimized load + SIGN_EXTEND patterns in the X86 backend.
llvm-svn: 170506
2012-12-19 07:50:20 +00:00
Bill Wendling 3d7b0b8ac7 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Jakub Staszak 338863a546 Reverse order of checking SSE level when calculating compare cost, so we check
AVX2 before AVX.

llvm-svn: 170464
2012-12-18 22:57:56 +00:00
Craig Topper f3ff6ae066 Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible.
llvm-svn: 170305
2012-12-17 05:12:30 +00:00
Benjamin Kramer b16ccde7a4 X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible.
We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273
2012-12-15 16:47:44 +00:00
Nadav Rotem 8487537bdb TypeLegalizer: Do not generate target specific nodes with illegal types, because we cant type-legalize them.
llvm-svn: 170245
2012-12-14 21:20:37 +00:00
Evan Cheng 962711ee71 Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I
mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959
2012-12-12 02:34:41 +00:00
Evan Cheng c3d1aca657 - Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term.
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.

llvm-svn: 169954
2012-12-12 01:32:07 +00:00
Evan Cheng 04e5518783 Avoid using lossy load / stores for memcpy / memset expansion. e.g.
f64 load / store on non-SSE2 x86 targets.

llvm-svn: 169944
2012-12-12 00:42:09 +00:00
Patrik Hagglund e98b7a0389 Revert EVT->MVT changes, r169836-169851, due to buildbot failures.
llvm-svn: 169854
2012-12-11 11:14:33 +00:00
Patrik Hagglund ad432a8e70 Change TargetLowering::getTypeForExtArgOrReturn to take and return
MVTs, instead of EVTs.

Accordingly, add bitsLT (and similar) to MVT.

llvm-svn: 169850
2012-12-11 10:20:51 +00:00
Patrik Hagglund 8d2e7cf561 Change TargetLowering::findRepresentativeClass to take an MVT, instead
of EVT.

llvm-svn: 169845
2012-12-11 09:57:18 +00:00
Evan Cheng 79e2ca90bc Some enhancements for memcpy / memset inline expansion.
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791
2012-12-10 23:21:26 +00:00
Shuxin Yang 95de7c37e2 - Re-enable population count loop idiom recognization
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
2012-12-09 03:12:46 +00:00
Chandler Carruth 91e47532fe Revert the patches adding a popcount loop idiom recognition pass.
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683
2012-12-08 22:18:29 +00:00
Bill Wendling e94d843e43 s/AttrListPtr/AttributeSet/g to better label what this class is going to be in the near future.
llvm-svn: 169651
2012-12-07 23:16:57 +00:00
Nadav Rotem ad0b5fbe8c When we use the BLEND instruction that uses the MSB as a mask, we can remove
the VSRI instruction before it since it does not affect the MSB.

Thanks Craig Topper for suggesting this.

llvm-svn: 169638
2012-12-07 21:43:11 +00:00
Nadav Rotem 481e50efe0 X86: Prefer using VPSHUFD over VPERMIL because it has better throughput.
llvm-svn: 169624
2012-12-07 19:01:13 +00:00
Evan Cheng 9ec512d768 Replace r169459 with something safer. Rather than having computeMaskedBits to
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).

rdar://12771555

llvm-svn: 169536
2012-12-06 19:13:27 +00:00
Jakub Staszak 40ee5674cd Remove unneeded function, since PR8156 was fixed over a year ago.
llvm-svn: 169534
2012-12-06 19:05:46 +00:00
Jakub Staszak 65ca2fb9e6 Simplify code.
llvm-svn: 169521
2012-12-06 18:22:59 +00:00
Evan Cheng 5213139f48 Let targets provide hooks that compute known zero and ones for any_extend
and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.

define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
  %tmp1 = icmp ult i32 %a, 100
  br i1 %tmp1, label %bb1, label %bb2
bb1:
  %tmp2 = load i16* %ptr, align 2
  br label %bb2
bb2:
  %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
  %cmp = icmp ult i16 %tmp3, 24
  br i1 %cmp, label %bb3, label %exit
bb3:
  call void @bar() nounwind
  br label %exit
exit:
  ret void
}

This compiles to the followings before:
        push    {lr}
        mov     r2, #0
        cmp     r1, #99
        bhi     LBB0_2
@ BB#1:                                 @ %bb1
        ldrh    r2, [r0]
LBB0_2:                                 @ %bb2
        uxth    r0, r2
        cmp     r0, #23
        bhi     LBB0_4
@ BB#3:                                 @ %bb3
        bl      _bar
LBB0_4:                                 @ %exit
        pop     {lr}
        bx      lr

The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.

rdar://12771555

llvm-svn: 169459
2012-12-06 01:28:01 +00:00
Elena Demikhovsky cd3c1c4a16 Simplified BLEND pattern matching for shuffles.
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
2012-12-05 09:24:57 +00:00
Evan Cheng d31802c1f6 Add x86 isel lowering logic to form bit test with inverted condition. e.g.
x ^ -1.

Patch by David Majnemer.
rdar://12755626

llvm-svn: 169339
2012-12-05 00:10:38 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Shuxin Yang abcc370423 rdar://12100355 (part 1)
This revision attempts to recognize following population-count pattern:

 while(a) { c++; ... ; a &= a - 1; ... },
  where <c> and <a>could be used multiple times in the loop body.

 TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent 
instruction sequence, which need to be improved in the following commits.

Reviewed by Nadav, really appreciate!

llvm-svn: 168931
2012-11-29 19:38:54 +00:00
Elena Demikhovsky eace43bff7 I changed hasAVX() to hasFp256() and hasAVX2() to hasInt256() in X86IselLowering.cpp.
The logic was not changed, only names.

llvm-svn: 168875
2012-11-29 12:44:59 +00:00
Jakub Staszak a17f3f8c30 Normalize splat 256bit vectors with 8 elements.
llvm-svn: 168600
2012-11-26 19:24:31 +00:00
Craig Topper c8c28d1ff0 Mark ISD::FMA as Legal instead of custom for x86 with FMA3/FMA4. Needed so that llvm.muladd can be converted to ISD::FMA for fp_contract.
llvm-svn: 168413
2012-11-21 05:36:24 +00:00
Duncan Sands d71b4e4568 Add the Erlang/HiPE calling convention, patch by Yiannis Tsiouris.
llvm-svn: 168166
2012-11-16 12:36:39 +00:00
Craig Topper 70601ba6f9 Use roundps/pd for llvm.ceil, llvm.trunc, llvm.rint, and llvm.nearbyint of vector types.
llvm-svn: 168141
2012-11-16 06:37:56 +00:00
Craig Topper 61d045781a Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.
llvm-svn: 168025
2012-11-15 06:51:10 +00:00
Benjamin Kramer 6293429b51 X86: Enable SSE memory intrinsics even when stack alignment is less than 16 bytes.
The stack realignment code was fixed to work when there is stack realignment and
a dynamic alloca is present so this shouldn't cause correctness issues anymore.

Note that this also enables generation of AVX instructions for memset
under the assumptions:
- Unaligned loads/stores are always fast on CPUs supporting AVX
- AVX is not slower than SSE
We may need some tweaked heuristics if one of those assumptions turns out not to
be true.

Effectively reverts r58317. Part of PR2962.

llvm-svn: 167967
2012-11-14 20:08:40 +00:00
Craig Topper a7f489d1ab Factor out an overly replicated typecast. No functional change.
llvm-svn: 167916
2012-11-14 06:41:09 +00:00
Manman Ren 0f3240d3a7 X86: when constructing VZEXT_LOAD from other loads, makes sure its output
chain is correctly setup.

As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.

rdar://12684358

llvm-svn: 167859
2012-11-13 19:13:05 +00:00
Michael Liao d39c0fb19f Fix PR14314
- Fix operand order for atomic sub, where the minuend is the value
  loaded from memory and the subtrahend is the parameter specified.

llvm-svn: 167718
2012-11-12 06:49:17 +00:00
Craig Topper dd13d3fda1 Move some helper methods to being static functions in the implementation file.
llvm-svn: 167696
2012-11-11 22:45:02 +00:00
Craig Topper a43e2fd3eb Remove unnecessary subtraction and addition by 1 around a couple for loops.
llvm-svn: 167673
2012-11-10 09:25:36 +00:00
Craig Topper 84afbf2b02 Tidy up spacing. No functional change.
llvm-svn: 167671
2012-11-10 09:02:47 +00:00
Craig Topper f5d527401f Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static.
llvm-svn: 167669
2012-11-10 08:57:41 +00:00
Craig Topper 9268c94b15 Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support.
llvm-svn: 167652
2012-11-10 01:23:36 +00:00
Nadav Rotem d1e906e1f1 indent
llvm-svn: 167607
2012-11-09 07:02:24 +00:00
Michael Liao 73cffddb95 Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Jakub Staszak 7d6ee3e1b4 Simplify code. No functionality change.
llvm-svn: 167505
2012-11-06 23:52:19 +00:00
Nadav Rotem 1c89744f32 Make the helper functions static. No functional change.
llvm-svn: 167501
2012-11-06 23:36:00 +00:00
Nadav Rotem f036ca466e CostModel: add another known vector trunc optimization.
llvm-svn: 167488
2012-11-06 21:17:17 +00:00
Nadav Rotem 0914f0b262 Cost Model: add tables for some avx type-conversion hacks.
llvm-svn: 167480
2012-11-06 19:33:53 +00:00
Nadav Rotem 48c5b8e659 Refactor the getTypeLegalizationCost interface. No functionality change.
llvm-svn: 167422
2012-11-05 23:57:45 +00:00
Nadav Rotem c378a8067d CostModel: Add tables for the common x86 compares.
llvm-svn: 167421
2012-11-05 23:48:20 +00:00
Richard Smith 18d2762048 Suppress signed/unsigned comparison warning.
llvm-svn: 167410
2012-11-05 22:01:44 +00:00
Nadav Rotem 856ffa6677 Cost Model: Normalize the insert/extract index when splitting types
llvm-svn: 167402
2012-11-05 21:12:13 +00:00
Nadav Rotem 7411623fd8 Implement the cost of abnormal x86 instruction lowering as a table.
llvm-svn: 167395
2012-11-05 19:32:46 +00:00
Nadav Rotem c2345cbe73 X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2.
llvm-svn: 167347
2012-11-03 00:39:56 +00:00
Chandler Carruth 5da3f0512e Revert the majority of the next patch in the address space series:
r165941: Resubmit the changes to llvm core to update the functions to
         support different pointer sizes on a per address space basis.

Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.

However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.

In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.

In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.

This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.

llvm-svn: 167222
2012-11-01 09:14:31 +00:00
Shuxin Yang 01efdd6c28 (For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization.
The adc/sbb optimization is to able to convert following expression
into a single adc/sbb instruction:
  (ult) ... = x + 1 // where the ult is unsigned-less-than comparison
  (ult) ... = x - 1

  This change is to flip the "x >u y" (i.e. ugt comparison) in order 
to expose the adc/sbb opportunity.

llvm-svn: 167180
2012-10-31 23:11:48 +00:00
Michael Liao e2d7e4e8e5 Clean up redundant SP register maintained in X86 TLI
llvm-svn: 167104
2012-10-31 04:14:09 +00:00
Manman Ren acb8becc73 X86 MMX: optimize transfer from mmx to i32
We used to generate a store (movq) + a load.
Now we use movd.

rdar://9946746

llvm-svn: 167056
2012-10-30 22:15:38 +00:00
Jakub Staszak a3d8e9974a Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance
to test it with chapni's fix (-mattr=+avx).

llvm-svn: 166985
2012-10-30 00:01:57 +00:00
Jakub Staszak d74cb61d86 Revert r166971. It causes buildbot failure. To be investigated.
llvm-svn: 166979
2012-10-29 23:13:50 +00:00
Jakub Staszak c3a92131dc Remove unused variable.
llvm-svn: 166973
2012-10-29 22:04:32 +00:00
Jakub Staszak 9c361bdfeb Simplify code. No functionality change.
llvm-svn: 166972
2012-10-29 22:02:26 +00:00
Jakub Staszak c8f4825ba6 Allow to fold vector load if there is more than one bitcast, so in the case:
%0 = load <8 x i16>* %dest
%1 = shufflevector <8 x i16> %0, <8 x i16> %in,
      <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14>
store <8 x i16> %1, <8 x i16>* %dest

We get:
  vmovlpd (%eax), %xmm0, %xmm0

instead of:
  vmovaps (%eax), %xmm1
  vmovsd  %xmm1, %xmm0, %xmm0

No extra test-case is added. I just fixed the existing one
(also it uses FileCheck now).

llvm-svn: 166971
2012-10-29 21:56:35 +00:00
Duncan Sands ac8448e0d0 Silence a GCC warning about comparing signed and unsigned types.
llvm-svn: 166922
2012-10-29 11:29:53 +00:00
Michael Liao 6d810bd9b8 Clean up where SlotSize should be used instead of pointer size.
llvm-svn: 166664
2012-10-25 06:29:14 +00:00
Michael Liao c5af149e70 Add custom conversion from v2u32 to v2f32 in 32-bit mode
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
  v2f32 is added to improve the efficiency of the code generated.

llvm-svn: 166545
2012-10-24 04:09:32 +00:00
Michael Liao 2843625bb5 Fix PR14161
- Check index being extracted to be constant 0 before simplfiying.
  Otherwise, retain the original sequence.

llvm-svn: 166504
2012-10-23 21:40:15 +00:00
Matt Beaumont-Gay bdcebd323a Silence -Wsign-compare
llvm-svn: 166494
2012-10-23 19:46:36 +00:00
Michael Liao c03c03d56e Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
  DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.

llvm-svn: 166487
2012-10-23 17:36:08 +00:00
Michael Liao 1be96bb5ce Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1
llvm-svn: 166486
2012-10-23 17:34:00 +00:00