Commit Graph

14151 Commits

Author SHA1 Message Date
Craig Topper ff9d45875a [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.
llvm-svn: 288009
2016-11-27 21:37:00 +00:00
Craig Topper 3674f44e40 [X86] Add SHL by 1 to the load folding tables.
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.

llvm-svn: 288007
2016-11-27 21:36:54 +00:00
Simon Pilgrim 91d6f5fbc1 [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Craig Topper 4fab487265 [AVX-512] Add integer and fp unpck instructions to load folding tables.
llvm-svn: 288004
2016-11-27 19:51:41 +00:00
Simon Pilgrim cdb2ce661d [X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.
Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit).

llvm-svn: 288003
2016-11-27 19:28:39 +00:00
Craig Topper 7ad961cc70 [X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size.
If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist.

I probably missed some instructions, but this should be a large portion of them.

llvm-svn: 288001
2016-11-27 18:51:13 +00:00
Craig Topper c3b3926f8b [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287995
2016-11-27 08:55:31 +00:00
Craig Topper fb64a25ba1 [X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction.
Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load.

llvm-svn: 287991
2016-11-27 01:52:51 +00:00
Craig Topper 837ff25da1 [X86] Remove hasOneUse check that is redundant with the one in IsProfitableToFold.
llvm-svn: 287987
2016-11-26 18:43:26 +00:00
Craig Topper e266e126ff [X86] Fix the zero extending load detection in X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold.
Previously we were passing the SCALAR_TO_VECTOR node.

llvm-svn: 287986
2016-11-26 18:43:24 +00:00
Craig Topper d3ab1a3905 [X86] Simplify control flow. NFCI
llvm-svn: 287985
2016-11-26 18:43:21 +00:00
Craig Topper 991d1ca3ba [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.

Reviewers: RKSimon, zvi, delena, spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D26790

llvm-svn: 287983
2016-11-26 17:29:25 +00:00
Craig Topper 10d5eec1a1 [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287975
2016-11-26 08:21:52 +00:00
Craig Topper 97169ea5f9 [AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables.
llvm-svn: 287974
2016-11-26 08:21:48 +00:00
Craig Topper 53b33de1e3 [AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables.
llvm-svn: 287972
2016-11-26 07:21:00 +00:00
Craig Topper 6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Craig Topper 39265bb1ce [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.
llvm-svn: 287970
2016-11-26 07:20:53 +00:00
Craig Topper 7f76c23781 [X86][XOP] Add a reversed reg/reg form for VPROT instructions.
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.

llvm-svn: 287961
2016-11-26 02:14:00 +00:00
Craig Topper 516fd7abfe [X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency.
Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete.

llvm-svn: 287960
2016-11-26 02:13:58 +00:00
Craig Topper a363d42973 [AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC
llvm-svn: 287956
2016-11-25 23:21:34 +00:00
Simon Pilgrim 8e8ae7219f Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper 1e48829747 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Craig Topper d4091494d3 [X86] Invert an 'if' and early out to fix a weird indentation. NFCI
llvm-svn: 287909
2016-11-25 02:29:24 +00:00
Craig Topper a46936185a [X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI
llvm-svn: 287908
2016-11-25 02:29:21 +00:00
Simon Pilgrim f1ee930db0 Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Benjamin Kramer fc54e35d94 [X86] Don't round trip a unique_ptr through a raw pointer for assignment.
No functional change.

llvm-svn: 287888
2016-11-24 15:17:39 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 7c26a6f9ef [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Nikolai Bozhenov a2dabed3b6 [x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
  make sure we actually check for that, instead of a check for
  an immediate value

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D24938

llvm-svn: 287873
2016-11-24 13:05:43 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Zvi Rackover 14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Tim Northover b64fb453ea CodeGen: simplify TargetMachine::getSymbol interface. NFC.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.

llvm-svn: 287645
2016-11-22 16:17:20 +00:00
Zvi Rackover 9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover 0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Coby Tayree 49b3733d57 [AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").

GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.

This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.

This commit aligns LLVM to GCC’s behavior.

This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587

Differential Revision: https://reviews.llvm.org/D26586

llvm-svn: 287630
2016-11-22 09:30:29 +00:00
Craig Topper 3dcf45f08d [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.

There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.

llvm-svn: 287622
2016-11-22 05:31:43 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Coby Tayree 94ddbb4a04 small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax.
Differential Revision: https://reviews.llvm.org/D26913

llvm-svn: 287548
2016-11-21 15:50:56 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Michael Zuckerman 8462faeaba Fixing a small typo (A->U).
This seem to fixes PR30992.

-         HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX 
+         HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX 

llvm-svn: 287532
2016-11-21 11:52:11 +00:00
Craig Topper 9f2d632ee7 [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Craig Topper 0dfc09372f [X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC
llvm-svn: 287519
2016-11-21 04:07:56 +00:00
Coby Tayree 99a6639047 The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it.
More specifically - (MS dialect) broadcasting variants were implemented falsely.

Differential Revision: https://reviews.llvm.org/D26257

llvm-svn: 287501
2016-11-20 17:19:55 +00:00
Coby Tayree 97e9cf62f4 Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list:
vcvtps2pd
vcvtudq2pd
vcvtps2qq
vcvttps2qq
vcvtps2uqq
vcvttps2uqq

variants are:

[Dst]XMM(zero-masked/merge-masked/unmasked)
[Src]Mem64

Differential Revision: https://reviews.llvm.org/D26799

llvm-svn: 287500
2016-11-20 17:09:56 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim c17e1b74b8 [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim 3f10e9953d Strip trailing whitespace
llvm-svn: 287492
2016-11-20 14:05:23 +00:00
Simon Pilgrim 096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Oren Ben Simhon c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Coby Tayree 179ff0e541 [X86][InlineAsm]Test commit.
Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false"

Differential Revision: https://reviews.llvm.org/D26797

llvm-svn: 287484
2016-11-20 09:31:11 +00:00
Simon Pilgrim a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim 623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Craig Topper 893ea9fb2c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Simon Pilgrim 7938bd666e Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Craig Topper 02b5a1b50f [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim 9d15fb3c10 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00
Simon Pilgrim 67ef3b984a Wdocumentation fix
llvm-svn: 287224
2016-11-17 12:21:45 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Craig Topper 05b0fcd168 [X86] Fix formatting. NFC
llvm-svn: 287211
2016-11-17 05:59:55 +00:00
Sanjay Patel 066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Peter Collingbourne 7d0c869b86 X86: Simplify X86ISD::Wrapper operand checks. NFCI.
We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol,
TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress
nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and
invert the other.

Also update the documentation comment for X86ISD::Wrapper.

Differential Revision: https://reviews.llvm.org/D26731

llvm-svn: 287160
2016-11-16 21:48:59 +00:00
Sanjay Patel 7f3d51f840 [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes
We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. 
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of 
compilers, but logically equivalent int, float, and double variants of bitwise-logic 
instructions are reality in x86, and the float variant may be a shorter instruction 
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all 
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

llvm-svn: 287122
2016-11-16 17:42:40 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Ayman Musa 4d60243bfd [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 287087
2016-11-16 09:00:28 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Sanjay Patel 73d1d35d21 fix formatting; NFC
llvm-svn: 286989
2016-11-15 17:47:13 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Zvi Rackover 6f76f46d2c [X86][FastISel] Assert that we are dealing with arithmetic with overflow intrinsics. NFC
llvm-svn: 286961
2016-11-15 13:50:35 +00:00
Zvi Rackover f0b9b57bd3 [X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

llvm-svn: 286958
2016-11-15 13:29:23 +00:00
Zvi Rackover 76dbf26599 [X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935
2016-11-15 06:34:33 +00:00
Simon Pilgrim 779da8e5ea [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

llvm-svn: 286838
2016-11-14 15:54:24 +00:00
Simon Pilgrim 27fed8e5d6 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Craig Topper 8f85ad1755 [AVX-512] Add suffixless aliases for EVEX encoded vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior.
Fixes another problem from PR28850.

llvm-svn: 286790
2016-11-14 02:46:58 +00:00
Craig Topper b8596e4d1d [X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.

This should fix at least some of PR28850.

llvm-svn: 286787
2016-11-14 01:53:29 +00:00
Craig Topper 353e59b6d6 [AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
2016-11-14 01:53:22 +00:00
Craig Topper ba13703bb3 [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd.
Fixes PR24941.

llvm-svn: 286775
2016-11-13 19:58:18 +00:00
Igor Breger e2399f9e0e revert commit r286761, some builds failed on Win platforms
llvm-svn: 286765
2016-11-13 15:48:11 +00:00
Ayman Musa c09b3769ae [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 286761
2016-11-13 14:51:25 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper 43e97649a1 [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.

llvm-svn: 286754
2016-11-13 07:26:15 +00:00
Craig Topper da6a63db1c [AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.

llvm-svn: 286725
2016-11-12 18:04:46 +00:00
Craig Topper 9d25c5e2fa [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26333

llvm-svn: 286711
2016-11-12 05:28:24 +00:00
Craig Topper 5cb13062d2 [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Reviewers: delena, RKSimon

Subscribers: Farhana, llvm-commits

Differential Revision: https://reviews.llvm.org/D26297

llvm-svn: 286709
2016-11-12 05:05:27 +00:00