Commit Graph

1652 Commits

Author SHA1 Message Date
Bruno Cardoso Lopes c53dd2ac01 Fix comment!
llvm-svn: 137521
2011-08-12 21:54:42 +00:00
Bruno Cardoso Lopes f15dfe5818 The VPERM2F128 is a AVX instruction which permutes between two 256-bit
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes 8fbf023c9b Add a dag combine to xform 256-bit shuffles into simple vector
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes 043c820800 Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Nadav Rotem efdd183f52 Add a comment, per Bruno's CR.
llvm-svn: 137313
2011-08-11 17:05:47 +00:00
Nadav Rotem 1542d5a00a [AVX] If the data which is going to be saved is already in two XMM registers
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Bruno Cardoso Lopes a2d8bb97b9 Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes 572c9aaf53 Use the splat index to generate the desired shuffle. Otherwise we
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman 3ae39f8ad1 Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
Fixes PR9693.

llvm-svn: 137292
2011-08-11 01:48:05 +00:00
Nadav Rotem 410a11fe82 When performing a truncating store, it is sometimes possible to rearrange the
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes 278ffd7d8e Fix a bug in vpermilps mask checking. Fix PR10560
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes 72323966c8 Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes 6963062a99 Use fp unpack instructions to unpack int types. Until we have AVX2, this
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Bruno Cardoso Lopes 24dd1d4a27 Revert r137114
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Bruno Cardoso Lopes ad3453cf2d Handle sitofp between v4f64 <- v4i32. Fix PR10559
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes af6a85484c Make LowerVSETCC aware of AVX types and add patterns to match them.
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes c96953c12a Add support for several vector shifts operations while in AVX mode. Fix PR10581
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Evan Cheng 19e3f80579 Fix an obvious type. Patch by Ivan Krasin.
llvm-svn: 136899
2011-08-04 18:38:15 +00:00
Bill Wendling e234f6ae0c Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
Fixes PR10527.

llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer 103e2ec2df Remove unused variables.
llvm-svn: 136803
2011-08-03 19:53:48 +00:00
Eli Friedman 04c5025cd5 Don't create a ridiculous EXTRACT_ELEMENT. PR10563.
The testcase looks extremely fragile, so I'm adding an assertion which should catch any cases like this.

llvm-svn: 136711
2011-08-02 18:38:35 +00:00
Bruno Cardoso Lopes 5ada908140 Make this kind of lowering to be supported by 256-bit instructions:
shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
  shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494

llvm-svn: 136691
2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes a8e3673816 Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise
the legalizer. This commit together with the two previous ones fixes
PR10495.

llvm-svn: 136654
2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes 616fe60548 Teach PreprocessISelDAG to be aware of vector types and to not process them.
llvm-svn: 136653
2011-08-01 21:54:05 +00:00
Bruno Cardoso Lopes bd30a4b584 Lower CONCAT_VECTORS to use two VINSERTF128 instructions instead of
using a stack store.

llvm-svn: 136652
2011-08-01 21:54:02 +00:00
Bruno Cardoso Lopes 7513939ddd Since vectors with all ones can't be created with a 256-bit instruction,
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547

llvm-svn: 136642
2011-08-01 19:51:53 +00:00
Eli Friedman adec587d5c Misc optimizer+codegen work for 'cmpxchg' and 'atomicrmw'. They appear to be
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)

llvm-svn: 136457
2011-07-29 03:05:32 +00:00
Bruno Cardoso Lopes 65ce5ea3ba Fix two tests that I crashed in the previous commits. The mask elts
on the second half must be reindexed.

llvm-svn: 136454
2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes 81eb193f2e Match VPERMIL masks more strictly and update the target specific mask
generation to always catch the weird cases.

llvm-svn: 136453
2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes 795f558532 Add DecodeShuffle shuffle support for VPERMIPD variantes
llvm-svn: 136452
2011-07-29 01:31:11 +00:00
Bruno Cardoso Lopes c00f6728bc Fix a bug while generating target specific VPERMIL masks: skip
undef mask elements. This fixes PR10529.

llvm-svn: 136450
2011-07-29 01:31:04 +00:00
Bruno Cardoso Lopes b9ba465de8 Enable usage of SSE4 extracts and inserts in their 128-bit AVX forms.
Also tidy up code a bit.

llvm-svn: 136449
2011-07-29 01:31:02 +00:00
Bruno Cardoso Lopes 6aee388423 Cleanup PALIGNR handling and remove the old palign pattern fragment.
Also make PALIGNR masks to don't match 256-bits, which isn't supported
It's also a step to solve PR10489

llvm-svn: 136448
2011-07-29 01:30:59 +00:00
Bruno Cardoso Lopes 8c19a8b5d5 Invert the subvector insertion to be more likely to be taken as a COPY
llvm-svn: 136324
2011-07-28 01:26:53 +00:00
Bruno Cardoso Lopes 9e2a301216 Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move
a convert pattern close to the instruction definition.

llvm-svn: 136320
2011-07-28 01:26:39 +00:00
Eli Friedman 26a484852e Code generation for 'fence' instruction.
llvm-svn: 136283
2011-07-27 22:21:52 +00:00
Jeffrey Yasskin 6381c0100b Explicitly cast narrowing conversions inside {}s that will become errors in
C++0x.

llvm-svn: 136211
2011-07-27 06:22:51 +00:00
Bruno Cardoso Lopes f9324f4f6b Move some code around to open opportunity for more shuffle matching
llvm-svn: 136201
2011-07-27 00:56:37 +00:00
Bruno Cardoso Lopes 27a30a7792 The vpermilps and vpermilpd have different behaviour regarding the
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Benjamin Kramer 124ac2b997 Add a neat little two's complement hack for x86.
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167
2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes f8fe47bd2b Recognize unpckh* masks and match 256-bit versions. The new versions are
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
2011-07-26 22:03:40 +00:00
Eli Friedman 93dc04d5ca Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130.
llvm-svn: 136148
2011-07-26 21:02:58 +00:00
Bruno Cardoso Lopes d77b383199 More movsldup/movshdup cleanup. Rewrite the mask matching function and add
support for 256-bit versions (but no instruction selection yet, coming next).

llvm-svn: 136050
2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes 5b268a4b82 More cleanup, subtarget info isn't used here.
llvm-svn: 136049
2011-07-26 02:39:25 +00:00
Bruno Cardoso Lopes 9212bf275d Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128
This also fixes PR10452

llvm-svn: 136004
2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes 123dff0f58 - Handle special scalar_to_vector case: splats. Using a native 128-bit
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.

llvm-svn: 136002
2011-07-25 23:05:25 +00:00
Bruno Cardoso Lopes 276eb8debf Reintroduce r135730, this is indeed the right approach, there is no
native 256-bit vector instruction to do scalar_to_vector.

llvm-svn: 136001
2011-07-25 23:05:16 +00:00
Eli Friedman ea8c66fea5 Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask.
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.

llvm-svn: 135980
2011-07-25 21:36:45 +00:00
Rafael Espindola 77242dd537 Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64
too. Patch by Jeff Muizelaar.

llvm-svn: 135789
2011-07-22 18:56:05 +00:00
Dan Gohman c535278cf1 Fix x86's XALUO lowering to return its replacement values instead
of doing the RAUW calls for the overflow value itself. This makes
it more consistent with how the rest of LegalizeDAG works.

llvm-svn: 135788
2011-07-22 18:45:15 +00:00