Commit Graph

4344 Commits

Author SHA1 Message Date
Craig Topper 6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Simon Pilgrim 8e8ae7219f Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Simon Pilgrim f1ee930db0 Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Zvi Rackover 14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Zvi Rackover 9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover 0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim c17e1b74b8 [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim 096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Oren Ben Simhon c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Simon Pilgrim a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim 623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Craig Topper 893ea9fb2c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Simon Pilgrim 7938bd666e Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim 9d15fb3c10 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00
Simon Pilgrim 67ef3b984a Wdocumentation fix
llvm-svn: 287224
2016-11-17 12:21:45 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Craig Topper 05b0fcd168 [X86] Fix formatting. NFC
llvm-svn: 287211
2016-11-17 05:59:55 +00:00
Sanjay Patel 066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Sanjay Patel 73d1d35d21 fix formatting; NFC
llvm-svn: 286989
2016-11-15 17:47:13 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Craig Topper 5cb13062d2 [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Reviewers: delena, RKSimon

Subscribers: Farhana, llvm-commits

Differential Revision: https://reviews.llvm.org/D26297

llvm-svn: 286709
2016-11-12 05:05:27 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Craig Topper f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Simon Pilgrim b3ad5f7ebf [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsElementInsertion. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286067
2016-11-06 14:20:29 +00:00
Craig Topper 1162857ec4 [AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available.
llvm-svn: 286057
2016-11-06 04:12:46 +00:00
Simon Pilgrim 4a9f210412 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBlend. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286045
2016-11-05 18:31:57 +00:00
Simon Pilgrim 725174694a [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsZeroOrAnyExtend. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286044
2016-11-05 18:22:13 +00:00
Simon Pilgrim 9f0afc6ae1 [X86][SSE] Reuse zeroable element mask in SSE4A EXTRQ/INSERTQ vector shuffle lowering. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286043
2016-11-05 18:05:13 +00:00
Simon Pilgrim 3cae21960e [X86][SSE] Reuse zeroable element mask in PSHUFB vector shuffle lowering. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286042
2016-11-05 17:53:27 +00:00
Simon Pilgrim 64a592d0a2 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsInsertPS. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286040
2016-11-05 17:27:48 +00:00
Simon Pilgrim 009befbd88 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBitMask. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286039
2016-11-05 17:12:19 +00:00
Simon Pilgrim 1af0fc1103 [X86][SSE] Reuse zeroable element mask instead of regenerating it. NFCI
We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs.

This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls.

llvm-svn: 286037
2016-11-05 16:40:20 +00:00
Simon Pilgrim 1b4e1ac966 Strip trailing whitespace. NFCI.
llvm-svn: 286034
2016-11-05 14:43:04 +00:00
Zvi Rackover a455864fdf Refactor creation of X86ISD::SETCC nodes to a helper function. NFC.
llvm-svn: 285917
2016-11-03 14:25:24 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Michael Zuckerman 68a5c53616 [x86][inline-asm][AVX512][llvm][PART-2]
Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions.

Commit on behalf of mharoush

Extending inline assembly support, compatible with GCC as folowing:
"k" constraint hints the compiler to select any of AVX512 k0-k7 registers.
"Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask.

Reviewer: 1. rnk

Differential Revision: https://reviews.llvm.org/D25062

llvm-svn: 285591
2016-10-31 16:19:58 +00:00
Elena Demikhovsky 519b4ccd70 Fixed FMA + FNEG combine.
Masked form of FMA should be omitted in this optimization.

Differential Revision: https://reviews.llvm.org/D25984

llvm-svn: 285492
2016-10-29 08:44:46 +00:00
Simon Pilgrim 47c1ff7a43 [X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen
As suggested by @igorb on D26011

llvm-svn: 285313
2016-10-27 17:07:40 +00:00
Simon Pilgrim 820e1326d7 [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
2016-10-27 15:27:00 +00:00
Zvi Rackover aa3402b41e [X86] AVX512 fallback for floating-point scalar selects
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.

Fixes pr30561

Reviewers: igorb, aymanmus, delena

Differential Revision: https://reviews.llvm.org/D25310

llvm-svn: 285196
2016-10-26 14:12:46 +00:00
Simon Pilgrim 5c3c9707c3 [X86][SSE] Add support for (V)PMOVSX* constant folding
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.

This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.

Differential Revision: https://reviews.llvm.org/D25874

llvm-svn: 285072
2016-10-25 14:29:25 +00:00
Craig Topper 01e4667e02 [AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq.
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.

Reviewers: delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25594

llvm-svn: 285053
2016-10-25 04:00:29 +00:00
Simon Pilgrim d3829c89bc [X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3
llvm-svn: 284922
2016-10-22 20:15:39 +00:00
Simon Pilgrim 56c0524f0f [X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3
llvm-svn: 284921
2016-10-22 19:53:59 +00:00
Craig Topper 7b2b8db438 [X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front.
llvm-svn: 284914
2016-10-22 06:51:52 +00:00
Craig Topper 9f374533e3 [X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC
llvm-svn: 284913
2016-10-22 06:51:49 +00:00
Craig Topper bea5cb5491 [X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask.
I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse.

llvm-svn: 284912
2016-10-22 06:51:44 +00:00
Simon Pilgrim 0d376bcbf0 [X86][SSE] Use getConstVector helper for VPERMV mask generation. NFCI.
llvm-svn: 284911
2016-10-22 06:18:36 +00:00
Peter Collingbourne e9bd49824d X86: Improve BT instruction selection for 64-bit values.
If a 64-bit value is tested against a bit which is known to be in the range
[0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly
shorter encoding.

Differential Revision: https://reviews.llvm.org/D25862

llvm-svn: 284864
2016-10-21 19:57:55 +00:00
Simon Pilgrim ab48872313 [X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL vpermw
llvm-svn: 284863
2016-10-21 19:54:38 +00:00
Simon Pilgrim da814cba0d [X86][AVX512BWVL] Added support for combining target v16i16 shuffles to AVX512BWVL vpermw
llvm-svn: 284860
2016-10-21 19:40:29 +00:00
Simon Pilgrim 0109bf116f [X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw
llvm-svn: 284858
2016-10-21 19:18:09 +00:00
Simon Pilgrim 2d96daa885 [X86] Use DAG::getBuildVector helper wrapper where possible. NFCI.
llvm-svn: 284835
2016-10-21 16:07:51 +00:00
Simon Pilgrim c98d99a600 [X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support.
llvm-svn: 284823
2016-10-21 13:00:47 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Peter Collingbourne de1f039360 X86: Deduplicate some lowering code. NFCI.
llvm-svn: 284686
2016-10-20 01:21:26 +00:00
Craig Topper a4dc340cf2 [AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast.
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.

New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.

There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.

The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.

Reviewers: RKSimon, delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25651

llvm-svn: 284567
2016-10-19 04:44:17 +00:00
Benjamin Kramer 4c2582ad78 Reduce global namespace pollution. NFC.
llvm-svn: 284521
2016-10-18 19:39:31 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Craig Topper 448358b5f1 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

llvm-svn: 284453
2016-10-18 04:48:33 +00:00
Craig Topper 7268bf99ab [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

llvm-svn: 284451
2016-10-18 04:00:32 +00:00
Craig Topper 5b24cd31f5 [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var.
llvm-svn: 284357
2016-10-17 04:26:47 +00:00
Craig Topper 715ad7fef5 [AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast.
Differential Revision: https://reviews.llvm.org/D25650

llvm-svn: 284353
2016-10-16 23:29:51 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
David L Kreitzer d5c6755d83 [safestack] Use non-thread-local unsafe stack pointer for Contiki OS
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D19852

llvm-svn: 284254
2016-10-14 17:56:00 +00:00
Pierre Gousseau b6d652adb5 [X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
  zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
  To:
  srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248
2016-10-14 16:41:38 +00:00
Saleem Abdulrasool 7705c4f1be CodeGen: use MSVC division on windows itanium
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.

llvm-svn: 284175
2016-10-13 23:00:11 +00:00
Saleem Abdulrasool 06383dd272 CodeGen: adjust floating point operations in Windows itanium
Windows itanium is equivalent to MSVC except in C++ mode.  Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.

llvm-svn: 284173
2016-10-13 22:38:15 +00:00
Igor Breger 8409c356ad [X86][AVX512] Fix sext v32i1 -> v32i8 lowering.
Fix PR30600.

Differential Revision: https://reviews.llvm.org/D25554

llvm-svn: 284134
2016-10-13 17:20:38 +00:00
Daniel Jasper bee9dea306 Silence unused warning in non-assert builds.
llvm-svn: 284107
2016-10-13 06:39:44 +00:00
Craig Topper ff23af4299 [AVX-512] Teach shuffle lowering to recognize 512-bit zero extends.
llvm-svn: 284105
2016-10-13 05:29:41 +00:00
Craig Topper 8cb2efa58a [X86] Simplify the lowering code for extracting and inserting subvectors.
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.

llvm-svn: 284102
2016-10-13 04:14:47 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Michael Zuckerman 3eeac2d56b [x86][inline-asm][llvm] accept 'v' constraint
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)

This patch applies the needed changes to clang
 clang patch: https://reviews.llvm.org/D25004

Differential Revision: D25005
 

llvm-svn: 283717
2016-10-10 05:48:56 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Simon Pilgrim a8d2168cb0 [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim 03afbe783d [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Craig Topper 46413af7f7 [X86] Don't set i64 ADDC/ADDE/SUBC/SUBE as Custom if the target isn't 64-bit. This way we don't have to catch them and do nothing with them in ReplaceNodeResults.
llvm-svn: 283066
2016-10-02 06:13:43 +00:00
Craig Topper 68c08931fc [X86] Fix indentation. NFC
llvm-svn: 283065
2016-10-02 06:13:40 +00:00
Simon Pilgrim 5b0c15ddf7 Fix signed/unsigned warning
llvm-svn: 283041
2016-10-01 16:14:57 +00:00
Simon Pilgrim 1638d49f20 [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim ae17cf20ce [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Craig Topper 3f37a4180b Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper de03ff7063 [X86] Add AVX-512 VTs to findRepresentativeClass as well as v16i16 which was also missing. Change register class to include the extra 16 AVX512 registers.
I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar.

llvm-svn: 282836
2016-09-30 04:31:37 +00:00
Craig Topper bc6e97b8f4 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Simon Pilgrim 55b8eaa505 Strip trailing whitespace
llvm-svn: 282579
2016-09-28 11:08:00 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Sanjay Patel 43ef1ad0ba [x86] use isNullFPConstant(); NFCI
Also, put the related FP logic functions together to see the similarities. 

llvm-svn: 282522
2016-09-27 18:48:02 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper 87155274b8 [X86] Remove what appears to be leftover MMX code involving (v1i64 scalar_to_vector).
llvm-svn: 282361
2016-09-25 16:34:11 +00:00
Craig Topper 8f2e85e669 [AVX-512] Don't use two opcodes for INTR_TYPE_SCALAR_MASK_RM. The handling was such that if the second opcode was present the first was ingored, so we can just have one opcode.
llvm-svn: 282344
2016-09-25 01:03:10 +00:00
Craig Topper 1776f4c965 [X86] Teach combineShuffle to avoid creating floating point operations with integer types and integer operations with floating point types. Seems isOperationLegal lies for mismatched types and operations.
Fixes PR30511.

llvm-svn: 282341
2016-09-24 21:42:49 +00:00
Craig Topper aeca0460f3 [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode because isel is not robust with multiple type profiles for the same opcode.
llvm-svn: 282340
2016-09-24 21:42:47 +00:00
Sanjay Patel 752ad8fde7 [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Sanjay Patel 0b36337d61 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Craig Topper a02e394872 [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes for each type constraint.
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.

llvm-svn: 282231
2016-09-23 06:24:43 +00:00
Craig Topper 3174b6e467 [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't reuse non-X86 ISD opcodes with extra X86 specific arguments.
llvm-svn: 282230
2016-09-23 06:24:39 +00:00
Craig Topper d70ec9b25e [AVX-512] Use different ISD opcodes for some of the scalar intrinsic lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate.
llvm-svn: 282229
2016-09-23 06:24:35 +00:00
Arnold Schwaighofer 0fd32c005b i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Craig Topper 29f1a1f834 [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.

We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.

With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.

llvm-svn: 282055
2016-09-21 06:37:54 +00:00
Craig Topper a27f54b4d9 [AVX-512] Simplify handling of INTR_TYPE_1OP_MASK_RM to remove support for the second opcode since its never used. This makes it consistent with INTR_TYPE_2OP_MASK_RM and INTR_TYPE_3OP_MASK_RM.
And even if it was used we were passing the same operands to both so it wouldn't make sense to have two opcodes.

llvm-svn: 282051
2016-09-21 03:58:41 +00:00
Craig Topper e18258dc1c [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
llvm-svn: 282046
2016-09-21 02:05:22 +00:00
Elena Demikhovsky d3ff7c288b AVX-512: Fixed a bug in lowering saturated operations on KNL.
The generated code is still not optimal.

Differential Revision: https://reviews.llvm.org/D24723

llvm-svn: 281966
2016-09-20 11:02:26 +00:00
Craig Topper 9820e341f9 [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp when F16C and VLX are not supported.
Fixes PR23941.

llvm-svn: 281958
2016-09-20 05:44:47 +00:00
Sanjay Patel e97f7947b1 [x86] fix variable names; NFC
llvm-svn: 281953
2016-09-20 00:27:22 +00:00
Sanjay Patel 0fa3365923 [x86] use getSignBit() to simplify code; NFCI
llvm-svn: 281944
2016-09-19 22:07:27 +00:00
Craig Topper b3b5033179 [AVX-512] Add support for lowering fp_to_f16 and f16_to_fp when VLX is supported regardless of whether F16C is also supported.
Still need to add support for lowering using AVX512F when neither VLX or F16C is supported.

llvm-svn: 281884
2016-09-19 02:53:37 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Craig Topper cc03165d3f [X86] Fix typo in comment. NFC
llvm-svn: 281862
2016-09-18 18:59:38 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Simon Pilgrim 6736096ac3 [X86][SSE] Improve target shuffle mask extraction
Add ability to extract vXi64 'vzext_movl' masks on 32-bit targets

llvm-svn: 281834
2016-09-17 18:50:54 +00:00
Sanjay Patel c531c9ebf5 [x86] fix formatting; NFC
llvm-svn: 281504
2016-09-14 17:23:18 +00:00
Simon Pilgrim a369219ce6 [X86][SSE] Improve recognition of i64 sitofp conversions that can be performed as i32 (PR29078)
Until AVX512DQ we only support i64/vXi64 sitofp conversion as scalars.

This patch sees if the sign bit extends far enough that we can truncate to a i32 type and then perform sitofp without loss of precision.

Differential Revision: https://reviews.llvm.org/D24345

llvm-svn: 281502
2016-09-14 17:15:26 +00:00
Simon Pilgrim fbbb28ebb3 [X86][SSE] Don't use PSHUFD directly - lower with generic shuffle
Remove the last user of the old getTargetShuffleNode helpers

llvm-svn: 281499
2016-09-14 17:04:22 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Simon Pilgrim ec2d206669 [X86][SSE] Removed unused getTargetShuffleNode function
llvm-svn: 281481
2016-09-14 14:30:00 +00:00
Simon Pilgrim ba325e3a73 [X86][SSE] Don't blend vector shifts with MOVSS/MOVSD directly, lower from generic shuffle
Shuffle lowering will correctly lower to MOVSS/MOVSD/PBLEND, improving commutation opportunities

llvm-svn: 281471
2016-09-14 14:08:18 +00:00
Elena Demikhovsky 0569d9d588 AVX-512: Fixed a bug in kortest.z intrinsic
Lowering was wrong - X86ISD::SETCC node should return i8 type.

llvm-svn: 281446
2016-09-14 08:06:54 +00:00
Igor Breger 74813fc19c [AVX512BW] Change truncStore action (v16i16->v16i18). It can be legal only with AVX512VL.
Differential Revision: http://reviews.llvm.org/D24547

llvm-svn: 281445
2016-09-14 08:04:28 +00:00
Simon Pilgrim a3d1e03cd7 [X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targets
Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets

llvm-svn: 281105
2016-09-09 21:47:21 +00:00
Hans Wennborg c39ef776fc Win64: Don't use REX prefix for direct tail calls
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.

Differential Revision: https://reviews.llvm.org/D24359

llvm-svn: 281003
2016-09-08 23:35:10 +00:00
Wei Mi f100d4e93d Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

llvm-svn: 280837
2016-09-07 18:22:17 +00:00
Sanjay Patel 0bf9a99c7d [x86] move combines of 'select of 2 constants' to its own function; NFC
There are missing folds here and possibly folds that could be made generic.

llvm-svn: 280817
2016-09-07 15:47:34 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper 43fbd840dd [X86] Remove unused encoding from IntrinsicType enum.
llvm-svn: 280694
2016-09-06 05:45:24 +00:00