Commit Graph

4190 Commits

Author SHA1 Message Date
Craig Topper de03ff7063 [X86] Add AVX-512 VTs to findRepresentativeClass as well as v16i16 which was also missing. Change register class to include the extra 16 AVX512 registers.
I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar.

llvm-svn: 282836
2016-09-30 04:31:37 +00:00
Craig Topper bc6e97b8f4 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Simon Pilgrim 55b8eaa505 Strip trailing whitespace
llvm-svn: 282579
2016-09-28 11:08:00 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Sanjay Patel 43ef1ad0ba [x86] use isNullFPConstant(); NFCI
Also, put the related FP logic functions together to see the similarities. 

llvm-svn: 282522
2016-09-27 18:48:02 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper 87155274b8 [X86] Remove what appears to be leftover MMX code involving (v1i64 scalar_to_vector).
llvm-svn: 282361
2016-09-25 16:34:11 +00:00
Craig Topper 8f2e85e669 [AVX-512] Don't use two opcodes for INTR_TYPE_SCALAR_MASK_RM. The handling was such that if the second opcode was present the first was ingored, so we can just have one opcode.
llvm-svn: 282344
2016-09-25 01:03:10 +00:00
Craig Topper 1776f4c965 [X86] Teach combineShuffle to avoid creating floating point operations with integer types and integer operations with floating point types. Seems isOperationLegal lies for mismatched types and operations.
Fixes PR30511.

llvm-svn: 282341
2016-09-24 21:42:49 +00:00
Craig Topper aeca0460f3 [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode because isel is not robust with multiple type profiles for the same opcode.
llvm-svn: 282340
2016-09-24 21:42:47 +00:00
Sanjay Patel 752ad8fde7 [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Sanjay Patel 0b36337d61 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Craig Topper a02e394872 [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes for each type constraint.
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.

llvm-svn: 282231
2016-09-23 06:24:43 +00:00
Craig Topper 3174b6e467 [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't reuse non-X86 ISD opcodes with extra X86 specific arguments.
llvm-svn: 282230
2016-09-23 06:24:39 +00:00
Craig Topper d70ec9b25e [AVX-512] Use different ISD opcodes for some of the scalar intrinsic lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate.
llvm-svn: 282229
2016-09-23 06:24:35 +00:00
Arnold Schwaighofer 0fd32c005b i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Craig Topper 29f1a1f834 [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.

We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.

With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.

llvm-svn: 282055
2016-09-21 06:37:54 +00:00
Craig Topper a27f54b4d9 [AVX-512] Simplify handling of INTR_TYPE_1OP_MASK_RM to remove support for the second opcode since its never used. This makes it consistent with INTR_TYPE_2OP_MASK_RM and INTR_TYPE_3OP_MASK_RM.
And even if it was used we were passing the same operands to both so it wouldn't make sense to have two opcodes.

llvm-svn: 282051
2016-09-21 03:58:41 +00:00
Craig Topper e18258dc1c [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
llvm-svn: 282046
2016-09-21 02:05:22 +00:00
Elena Demikhovsky d3ff7c288b AVX-512: Fixed a bug in lowering saturated operations on KNL.
The generated code is still not optimal.

Differential Revision: https://reviews.llvm.org/D24723

llvm-svn: 281966
2016-09-20 11:02:26 +00:00
Craig Topper 9820e341f9 [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp when F16C and VLX are not supported.
Fixes PR23941.

llvm-svn: 281958
2016-09-20 05:44:47 +00:00
Sanjay Patel e97f7947b1 [x86] fix variable names; NFC
llvm-svn: 281953
2016-09-20 00:27:22 +00:00
Sanjay Patel 0fa3365923 [x86] use getSignBit() to simplify code; NFCI
llvm-svn: 281944
2016-09-19 22:07:27 +00:00
Craig Topper b3b5033179 [AVX-512] Add support for lowering fp_to_f16 and f16_to_fp when VLX is supported regardless of whether F16C is also supported.
Still need to add support for lowering using AVX512F when neither VLX or F16C is supported.

llvm-svn: 281884
2016-09-19 02:53:37 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Craig Topper cc03165d3f [X86] Fix typo in comment. NFC
llvm-svn: 281862
2016-09-18 18:59:38 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Simon Pilgrim 6736096ac3 [X86][SSE] Improve target shuffle mask extraction
Add ability to extract vXi64 'vzext_movl' masks on 32-bit targets

llvm-svn: 281834
2016-09-17 18:50:54 +00:00
Sanjay Patel c531c9ebf5 [x86] fix formatting; NFC
llvm-svn: 281504
2016-09-14 17:23:18 +00:00
Simon Pilgrim a369219ce6 [X86][SSE] Improve recognition of i64 sitofp conversions that can be performed as i32 (PR29078)
Until AVX512DQ we only support i64/vXi64 sitofp conversion as scalars.

This patch sees if the sign bit extends far enough that we can truncate to a i32 type and then perform sitofp without loss of precision.

Differential Revision: https://reviews.llvm.org/D24345

llvm-svn: 281502
2016-09-14 17:15:26 +00:00
Simon Pilgrim fbbb28ebb3 [X86][SSE] Don't use PSHUFD directly - lower with generic shuffle
Remove the last user of the old getTargetShuffleNode helpers

llvm-svn: 281499
2016-09-14 17:04:22 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Simon Pilgrim ec2d206669 [X86][SSE] Removed unused getTargetShuffleNode function
llvm-svn: 281481
2016-09-14 14:30:00 +00:00
Simon Pilgrim ba325e3a73 [X86][SSE] Don't blend vector shifts with MOVSS/MOVSD directly, lower from generic shuffle
Shuffle lowering will correctly lower to MOVSS/MOVSD/PBLEND, improving commutation opportunities

llvm-svn: 281471
2016-09-14 14:08:18 +00:00
Elena Demikhovsky 0569d9d588 AVX-512: Fixed a bug in kortest.z intrinsic
Lowering was wrong - X86ISD::SETCC node should return i8 type.

llvm-svn: 281446
2016-09-14 08:06:54 +00:00
Igor Breger 74813fc19c [AVX512BW] Change truncStore action (v16i16->v16i18). It can be legal only with AVX512VL.
Differential Revision: http://reviews.llvm.org/D24547

llvm-svn: 281445
2016-09-14 08:04:28 +00:00
Simon Pilgrim a3d1e03cd7 [X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targets
Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets

llvm-svn: 281105
2016-09-09 21:47:21 +00:00
Hans Wennborg c39ef776fc Win64: Don't use REX prefix for direct tail calls
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.

Differential Revision: https://reviews.llvm.org/D24359

llvm-svn: 281003
2016-09-08 23:35:10 +00:00
Wei Mi f100d4e93d Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

llvm-svn: 280837
2016-09-07 18:22:17 +00:00
Sanjay Patel 0bf9a99c7d [x86] move combines of 'select of 2 constants' to its own function; NFC
There are missing folds here and possibly folds that could be made generic.

llvm-svn: 280817
2016-09-07 15:47:34 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper 43fbd840dd [X86] Remove unused encoding from IntrinsicType enum.
llvm-svn: 280694
2016-09-06 05:45:24 +00:00
Craig Topper a0055d315d [X86] Fix indentation. NFC
llvm-svn: 280693
2016-09-06 05:45:21 +00:00
Craig Topper 62d0a5e7d3 [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.
llvm-svn: 280684
2016-09-06 00:31:10 +00:00
Igor Breger 7e2a0dfa0c revert r279960.
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
2016-09-04 14:03:52 +00:00
Simon Pilgrim 3606d2346c Strip trailing whitespace
llvm-svn: 280598
2016-09-03 20:36:05 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Elena Demikhovsky 4d7738dfde Optimized FMA intrinsic + FNEG , like
-(a*b+c)

and FNEG + FMA, like
a*b-c or (-a)*b+c.

The bug description is here :  https://llvm.org/bugs/show_bug.cgi?id=28892

Differential revision: https://reviews.llvm.org/D23313

llvm-svn: 280368
2016-09-01 13:58:53 +00:00
Michael Kuperstein 173b43da35 Fix typo in comment. NFC.
llvm-svn: 280025
2016-08-29 22:49:05 +00:00
Haojian Wu eab33cecf3 Fix -Wunused-but-set-variable warning.
Summary: A follow-up fix on r279958.

Reviewers: bkramer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23989

llvm-svn: 279964
2016-08-29 12:26:33 +00:00
Igor Breger 1a388871b9 [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence.
Differential Revision: http://reviews.llvm.org/D23490

llvm-svn: 279960
2016-08-29 08:52:52 +00:00
Craig Topper 713085e60a [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.

llvm-svn: 279958
2016-08-29 04:49:31 +00:00
Craig Topper f0e822ff31 [AVX-512] Always use v8i64 when converting 512-bit FAND/FOR/FXOR/FANDN to integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors.
llvm-svn: 279957
2016-08-29 04:49:27 +00:00
Simon Pilgrim 5369cd9e9c [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same number of vector elements
Over eager combing prevents the correct folding of writemasks.

At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask.

llvm-svn: 279934
2016-08-28 17:27:14 +00:00
Craig Topper abe80cc04d [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have AVX512F/AVX512VL.
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.

Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.

llvm-svn: 279929
2016-08-28 06:06:28 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Simon Pilgrim 5aa9c203ac [X86][SSE] INSERTPS is only combined on v4f32 types. NFCI.
llvm-svn: 279751
2016-08-25 17:02:00 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 6392b8d4ce [X86][SSE] Add support for 32-bit element vectors to X86ISD::VZEXT_LOAD
Consecutive load matching (EltsFromConsecutiveLoads) currently uses VZEXT_LOAD (load scalar into lowest element and zero uppers) for vXi64 / vXf64 vectors only.

For vXi32 / vXf32 vectors it instead creates a scalar load, SCALAR_TO_VECTOR and finally VZEXT_MOVL (zero upper vector elements), relying on tablegen patterns to match this into an equivalent of VZEXT_LOAD.

This patch adds the VZEXT_LOAD patterns for vXi32 / vXf32 vectors directly and updates EltsFromConsecutiveLoads to use this.

This has proven necessary to allow us to easily make VZEXT_MOVL a full member of the target shuffle set - without this change the call to combineShuffle (which is the main caller of EltsFromConsecutiveLoads) tended to recursively recreate VZEXT_MOVL nodes......

Differential Revision: https://reviews.llvm.org/D23673

llvm-svn: 279619
2016-08-24 10:46:40 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Pilgrim 13fa33012b [X86] Only accept SM_SentinelUndef (-1) as an undefined shuffle mask in range
As discussed on D23027 we should be trying to be more strict on what is an undefined mask value.

llvm-svn: 279435
2016-08-22 13:18:56 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Simon Pilgrim 67e7e22462 [X86][AVX] Dropped combineShuffle256 - this can now be performed by EltsFromConsecutiveLoads
llvm-svn: 279397
2016-08-21 15:39:45 +00:00
Simon Pilgrim d7a3782ae4 [X86][SSE] Generalised combining to VZEXT_MOVL to any vector size
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.

llvm-svn: 279273
2016-08-19 17:02:00 +00:00
Simon Pilgrim f1b8fdc074 [X86][SSE] Add support for matching commuted insertps patterns
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match

llvm-svn: 279230
2016-08-19 10:31:53 +00:00
Sanjay Patel 7d37b221a2 fix typo; NFC
llvm-svn: 279068
2016-08-18 14:17:34 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Simon Pilgrim cc316f013a [X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations
The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion

llvm-svn: 278794
2016-08-16 12:52:06 +00:00
Simon Pilgrim f16cd361d4 [X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations
llvm-svn: 278787
2016-08-16 10:03:23 +00:00
Guy Blank 722caebdae [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
Igor Breger 505f2cc468 [AVX512] Fix VFPCLASSSD/VFPCLASSSS intrinsic lowering. The i1 result should be zero extended according to SPEC.
Differential Revision: http://reviews.llvm.org/D23489

llvm-svn: 278626
2016-08-14 13:58:57 +00:00
Igor Breger 8672408db0 [AVX512] Fix insertelement i1 lowering.
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.

Differential Revision: http://reviews.llvm.org/D23347

llvm-svn: 278623
2016-08-14 05:25:07 +00:00
Artur Pilipenko 87e4038a91 [x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext)
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)

In this change I extend this code to handle zexts as well.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D23359

llvm-svn: 278520
2016-08-12 16:08:30 +00:00
Simon Pilgrim 687d71e877 [X86][SSE] Add support for combining target shuffles to PSLLDQ/PSRLDQ byte shifts
llvm-svn: 278502
2016-08-12 11:24:34 +00:00
Simon Pilgrim ed96b9adfb [X86][SSE] Fixed PALIGNR target shuffle decode
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.

The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.

llvm-svn: 278494
2016-08-12 10:10:51 +00:00
David Majnemer 42531260b3 Use the range variant of find/find_if instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278469
2016-08-12 03:55:06 +00:00
David Majnemer 562e82945e Use the range variant of find_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278443
2016-08-12 00:18:03 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Igor Breger a77b14d02c [AVX512] Fix extractelement i1 lowering.
The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing.
Make extractelement i1 lowering custom for all vector i1.

Differential Revision: http://reviews.llvm.org/D23246

llvm-svn: 278328
2016-08-11 12:13:46 +00:00
Craig Topper a78b768ed4 [AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency.
llvm-svn: 278318
2016-08-11 06:04:07 +00:00
Sanjay Patel 5ccc85fe83 [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)
This handles the case in:
https://llvm.org/bugs/show_bug.cgi?id=28895

...but we are not getting all of the possibilities yet. 
Eg, we use 'X86::FANDN' for scalar FP select combines.

That enhancement is filed as:
https://llvm.org/bugs/show_bug.cgi?id=28925

Differential Revision: https://reviews.llvm.org/D23337

llvm-svn: 278270
2016-08-10 19:00:11 +00:00
Simon Pilgrim 675c257a32 [X86][SSE] Dropped blend(insertps(x,y),zero) combine - this is now handled by target shuffle chain combining
llvm-svn: 278260
2016-08-10 18:10:29 +00:00
Simon Pilgrim ac8fa6c2c6 [X86][SSE] Add support for combining target shuffles to MOVSS/MOVSD
Only do this on pre-SSE41 targets where we should be lowering to BLENDPS/BLENDPD instead

llvm-svn: 278228
2016-08-10 14:15:41 +00:00
Simon Pilgrim 9811e98495 [X86][SSE] Only treat SM_SentinelUndef as UNDEF in shuffle mask predicates
isUndefOrEqual and isUndefOrInRange treated all -ve shuffle mask values as UNDEF, now it has to be SM_SentinelUndef (-1)

We already have asserts to check that lowered SHUFFLE_VECTOR indices are in the range -1 <= index < 2*masksize (or masksize for unary shuffles)

llvm-svn: 278218
2016-08-10 12:55:25 +00:00
Simon Pilgrim cb419a896c [X86][SSE] Reorder shuffle mask undef helper predicates. NFCI
To make it easier for a more complex helper to use a simpler one

llvm-svn: 278216
2016-08-10 12:34:23 +00:00
Simon Pilgrim 27740d038c [X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS
llvm-svn: 278120
2016-08-09 12:56:15 +00:00
Simon Pilgrim aae7d4a1b6 [X86][XOP] Add support for combining target shuffles to VPPERM
llvm-svn: 278114
2016-08-09 10:56:29 +00:00
Sanjay Patel 06ba09af67 [x86] split combineVSelectWithAllOnesOrZeros into a helper function; NFCI
llvm-svn: 278074
2016-08-09 00:01:11 +00:00
Charles Davis e9c32c7ed3 Revert "[X86] Support the "ms-hotpatch" attribute."
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.

llvm-svn: 278053
2016-08-08 21:20:15 +00:00