Bruno Cardoso Lopes
c53dd2ac01
Fix comment!
...
llvm-svn: 137521
2011-08-12 21:54:42 +00:00
Bruno Cardoso Lopes
f15dfe5818
The VPERM2F128 is a AVX instruction which permutes between two 256-bit
...
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes
8fbf023c9b
Add a dag combine to xform 256-bit shuffles into simple vector
...
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes
043c820800
Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
...
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Nadav Rotem
efdd183f52
Add a comment, per Bruno's CR.
...
llvm-svn: 137313
2011-08-11 17:05:47 +00:00
Nadav Rotem
1542d5a00a
[AVX] If the data which is going to be saved is already in two XMM registers
...
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Bruno Cardoso Lopes
a2d8bb97b9
Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
...
infinite recursive calls in legalize. Fix PR10562
llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes
572c9aaf53
Use the splat index to generate the desired shuffle. Otherwise we
...
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.
llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman
3ae39f8ad1
Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
...
Fixes PR9693.
llvm-svn: 137292
2011-08-11 01:48:05 +00:00
Nadav Rotem
410a11fe82
When performing a truncating store, it is sometimes possible to rearrange the
...
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes
278ffd7d8e
Fix a bug in vpermilps mask checking. Fix PR10560
...
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes
72323966c8
Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
...
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes
6963062a99
Use fp unpack instructions to unpack int types. Until we have AVX2, this
...
is the best we can do for these patterns. This fix PR10554.
llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Bruno Cardoso Lopes
24dd1d4a27
Revert r137114
...
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Bruno Cardoso Lopes
ad3453cf2d
Handle sitofp between v4f64 <- v4i32. Fix PR10559
...
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes
af6a85484c
Make LowerVSETCC aware of AVX types and add patterns to match them.
...
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes
c96953c12a
Add support for several vector shifts operations while in AVX mode. Fix PR10581
...
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Evan Cheng
19e3f80579
Fix an obvious type. Patch by Ivan Krasin.
...
llvm-svn: 136899
2011-08-04 18:38:15 +00:00
Bill Wendling
e234f6ae0c
Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
...
Fixes PR10527.
llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer
103e2ec2df
Remove unused variables.
...
llvm-svn: 136803
2011-08-03 19:53:48 +00:00
Eli Friedman
04c5025cd5
Don't create a ridiculous EXTRACT_ELEMENT. PR10563.
...
The testcase looks extremely fragile, so I'm adding an assertion which should catch any cases like this.
llvm-svn: 136711
2011-08-02 18:38:35 +00:00
Bruno Cardoso Lopes
5ada908140
Make this kind of lowering to be supported by 256-bit instructions:
...
shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494
llvm-svn: 136691
2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes
a8e3673816
Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise
...
the legalizer. This commit together with the two previous ones fixes
PR10495.
llvm-svn: 136654
2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes
616fe60548
Teach PreprocessISelDAG to be aware of vector types and to not process them.
...
llvm-svn: 136653
2011-08-01 21:54:05 +00:00
Bruno Cardoso Lopes
bd30a4b584
Lower CONCAT_VECTORS to use two VINSERTF128 instructions instead of
...
using a stack store.
llvm-svn: 136652
2011-08-01 21:54:02 +00:00
Bruno Cardoso Lopes
7513939ddd
Since vectors with all ones can't be created with a 256-bit instruction,
...
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547
llvm-svn: 136642
2011-08-01 19:51:53 +00:00
Eli Friedman
adec587d5c
Misc optimizer+codegen work for 'cmpxchg' and 'atomicrmw'. They appear to be
...
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
2011-07-29 03:05:32 +00:00
Bruno Cardoso Lopes
65ce5ea3ba
Fix two tests that I crashed in the previous commits. The mask elts
...
on the second half must be reindexed.
llvm-svn: 136454
2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes
81eb193f2e
Match VPERMIL masks more strictly and update the target specific mask
...
generation to always catch the weird cases.
llvm-svn: 136453
2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes
795f558532
Add DecodeShuffle shuffle support for VPERMIPD variantes
...
llvm-svn: 136452
2011-07-29 01:31:11 +00:00
Bruno Cardoso Lopes
c00f6728bc
Fix a bug while generating target specific VPERMIL masks: skip
...
undef mask elements. This fixes PR10529.
llvm-svn: 136450
2011-07-29 01:31:04 +00:00
Bruno Cardoso Lopes
b9ba465de8
Enable usage of SSE4 extracts and inserts in their 128-bit AVX forms.
...
Also tidy up code a bit.
llvm-svn: 136449
2011-07-29 01:31:02 +00:00
Bruno Cardoso Lopes
6aee388423
Cleanup PALIGNR handling and remove the old palign pattern fragment.
...
Also make PALIGNR masks to don't match 256-bits, which isn't supported
It's also a step to solve PR10489
llvm-svn: 136448
2011-07-29 01:30:59 +00:00
Bruno Cardoso Lopes
8c19a8b5d5
Invert the subvector insertion to be more likely to be taken as a COPY
...
llvm-svn: 136324
2011-07-28 01:26:53 +00:00
Bruno Cardoso Lopes
9e2a301216
Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move
...
a convert pattern close to the instruction definition.
llvm-svn: 136320
2011-07-28 01:26:39 +00:00
Eli Friedman
26a484852e
Code generation for 'fence' instruction.
...
llvm-svn: 136283
2011-07-27 22:21:52 +00:00
Jeffrey Yasskin
6381c0100b
Explicitly cast narrowing conversions inside {}s that will become errors in
...
C++0x.
llvm-svn: 136211
2011-07-27 06:22:51 +00:00
Bruno Cardoso Lopes
f9324f4f6b
Move some code around to open opportunity for more shuffle matching
...
llvm-svn: 136201
2011-07-27 00:56:37 +00:00
Bruno Cardoso Lopes
27a30a7792
The vpermilps and vpermilpd have different behaviour regarding the
...
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.
llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Benjamin Kramer
124ac2b997
Add a neat little two's complement hack for x86.
...
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.
This code is generated for __builtin_clz and friends.
llvm-svn: 136167
2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes
f8fe47bd2b
Recognize unpckh* masks and match 256-bit versions. The new versions are
...
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
llvm-svn: 136157
2011-07-26 22:03:40 +00:00
Eli Friedman
93dc04d5ca
Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130.
...
llvm-svn: 136148
2011-07-26 21:02:58 +00:00
Bruno Cardoso Lopes
d77b383199
More movsldup/movshdup cleanup. Rewrite the mask matching function and add
...
support for 256-bit versions (but no instruction selection yet, coming next).
llvm-svn: 136050
2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes
5b268a4b82
More cleanup, subtarget info isn't used here.
...
llvm-svn: 136049
2011-07-26 02:39:25 +00:00
Bruno Cardoso Lopes
9212bf275d
Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128
...
This also fixes PR10452
llvm-svn: 136004
2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes
123dff0f58
- Handle special scalar_to_vector case: splats. Using a native 128-bit
...
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.
llvm-svn: 136002
2011-07-25 23:05:25 +00:00
Bruno Cardoso Lopes
276eb8debf
Reintroduce r135730, this is indeed the right approach, there is no
...
native 256-bit vector instruction to do scalar_to_vector.
llvm-svn: 136001
2011-07-25 23:05:16 +00:00
Eli Friedman
ea8c66fea5
Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask.
...
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.
llvm-svn: 135980
2011-07-25 21:36:45 +00:00
Rafael Espindola
77242dd537
Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64
...
too. Patch by Jeff Muizelaar.
llvm-svn: 135789
2011-07-22 18:56:05 +00:00
Dan Gohman
c535278cf1
Fix x86's XALUO lowering to return its replacement values instead
...
of doing the RAUW calls for the overflow value itself. This makes
it more consistent with how the rest of LegalizeDAG works.
llvm-svn: 135788
2011-07-22 18:45:15 +00:00