Commit Graph

4913 Commits

Author SHA1 Message Date
Craig Topper 1a0da2db5f [X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C)
Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG.

llvm-svn: 317878
2017-11-10 08:22:37 +00:00
Craig Topper 93e27d2ecc [X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine.
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.

llvm-svn: 317748
2017-11-09 01:06:47 +00:00
Craig Topper cf8e6d0a76 [X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics.
Looks like there's some missed load folding opportunities for i64 loads.

llvm-svn: 317544
2017-11-07 07:13:03 +00:00
Craig Topper 428a4e6374 [X86] Make FeatureAVX512 imply FeatureF16C.
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.

Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.

All known CPUs with AVX512 have F16C so this should safe for now.

llvm-svn: 317521
2017-11-06 22:49:04 +00:00
Simon Pilgrim ad9b9720e8 [X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI.
We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch.

llvm-svn: 317485
2017-11-06 15:28:25 +00:00
Simon Pilgrim 14450720e6 [X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before XFormVExtractWithShuffleIntoLoad
combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad.

Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit.

llvm-svn: 317481
2017-11-06 14:34:19 +00:00
Uriel Korach bb86686a8b [X86][AVX512] Improve lowering of AVX512 test intrinsics
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38732

llvm-svn: 317465
2017-11-06 09:22:38 +00:00
Zvi Rackover 3122698040 X86 ISel: Basic support for variable-index vector permutations
Summary:
Try to lower a BUILD_VECTOR composed of extract-extract chains that can be
reasoned to be a permutation of a vector by indices in a non-constant vector.

We saw this pattern created by ISPC, which resolts to creating it due to the
requirement that shufflevector's mask operand be a *constant* vector.
I didn't check this but we could possibly use this pattern for lowering the X86 permute
C-instrinsics instead of llvm.x86 instrinsics.

This change can be followed by more improvements:
1. Handle vectors with undef elements.
2. Utilize pshufb and zero-mask-blending to support more effiecient
   construction of vectors with constant-0 elements.
3. Use smaller-element vectors of same width, and "interpolate" the indices,
   when no native operation available.

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: chandlerc, DavidKreitzer

Differential Revision: https://reviews.llvm.org/D39126

llvm-svn: 317463
2017-11-06 08:25:46 +00:00
Jina Nahias 3844f1ad5c Revert "adding a pattern for broadcastm"
This reverts commit r317457.

Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91
llvm-svn: 317462
2017-11-06 07:48:58 +00:00
Jina Nahias 7b705f1f91 [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38684

Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
2017-11-06 07:09:24 +00:00
Jina Nahias 9c6561b648 adding a pattern for broadcastm
Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 317457
2017-11-06 07:09:09 +00:00
Craig Topper 07dac55d95 [X86] Add scalar FMA ISD nodes without rounding mode. NFC
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.

llvm-svn: 317453
2017-11-06 05:48:25 +00:00
Craig Topper 692c8efe30 [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled.
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.

Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.

I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.

As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.

This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.

Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.

Reviewers: zvi, DavidKreitzer, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39583

llvm-svn: 317413
2017-11-04 18:26:41 +00:00
Craig Topper a96d62b360 [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.

As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.

llvm-svn: 317403
2017-11-04 06:44:47 +00:00
Craig Topper d21a53f246 [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load.
llvm-svn: 317382
2017-11-03 22:48:13 +00:00
Simon Pilgrim ae1f013495 [X86][SSE] Add PACKUS support to combineVectorTruncation
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317315
2017-11-03 11:33:48 +00:00
Craig Topper 333897ec31 [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible.
llvm-svn: 317299
2017-11-03 06:48:02 +00:00
Simon Pilgrim e152c2c447 [X86][SSE] Add PACKUS support to LowerTruncate
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317128
2017-11-01 21:52:29 +00:00
Simon Pilgrim 778810eb42 [X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with PACKSS/PACKUS functions
Renamed to truncateVectorWithPACK

llvm-svn: 317098
2017-11-01 15:31:51 +00:00
Simon Pilgrim f657ba0cb6 [X86][SSE] Truncate with PACKSS any input with sufficient sign-bits
So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes).

The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc.

Differential Revision: https://reviews.llvm.org/D39476

llvm-svn: 317086
2017-11-01 11:47:44 +00:00
Simon Pilgrim f3c33ca83e [X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to computeKnownBits/ComputeNumSignBits
Mainly a perf improvements as most combines will have occurred before we lower to these instructions

llvm-svn: 317005
2017-10-31 16:06:21 +00:00
Jina Nahias 5bf6620b15 [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b
llvm-svn: 316921
2017-10-30 16:37:28 +00:00
Craig Topper 4e13d4de52 [X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used.
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.

This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.

The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.

This should fully fix PR35068 finishing the fix started in r316860.

Reviewers: RKSimon, zvi, spatel

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39411

llvm-svn: 316913
2017-10-30 14:51:37 +00:00
Jina Nahias e63db55c67 Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic."
This reverts commit r316890.

Change-Id: I683cceee9848ef309b452293086b1f26a941950d
llvm-svn: 316894
2017-10-30 10:35:53 +00:00
Jina Nahias 70280f9a0d [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 316890
2017-10-30 09:59:52 +00:00
Craig Topper 495a1bc893 [X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this.
If the carry flag is being used, this transformation isn't safe.

This does prevent some test cases from using DEC now, but I'll try to look into that separately.

Fixes PR35068.

llvm-svn: 316860
2017-10-29 06:51:04 +00:00
Craig Topper 7a60e29185 [X86] Fix typo in comment. NFC
llvm-svn: 316859
2017-10-29 06:51:02 +00:00
Craig Topper 0692ca4bd2 [X86] Remove invalid code from LowerVSELECT.
This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal.

We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable.

llvm-svn: 316851
2017-10-28 23:10:13 +00:00
Simon Pilgrim 294f88dfa0 [X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS.
llvm-svn: 316845
2017-10-28 20:51:27 +00:00
Simon Pilgrim bd3852aa5e [X86][SSE] Split off matchVectorShuffleWithPACK. NFCI.
Split matchVectorShuffleWithPACK from lowerVectorShuffleWithPACK so that we can reuse it for target shuffle combines

llvm-svn: 316844
2017-10-28 20:27:22 +00:00
Simon Pilgrim 25808c303f [X86][SSE] Rename truncateVectorCompareWithPACKSS to truncateVectorWithPACKSS. NFC.
We no longer rely on the vector source being a comparison result, just have sufficient sign bits.

llvm-svn: 316834
2017-10-28 17:59:56 +00:00
Craig Topper b8d7d4d683 [X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions.
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.

This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.

This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.

Differential Revision: https://reviews.llvm.org/D38275

llvm-svn: 316702
2017-10-26 21:12:03 +00:00
Sanjay Patel ac50f3e907 [x86] use an insert op to put one variable element into a constant of vectors
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.

Differential Revision: https://reviews.llvm.org/D38756

llvm-svn: 316685
2017-10-26 18:27:55 +00:00
Simon Pilgrim 5e8c3f328f [X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC
llvm-svn: 316462
2017-10-24 17:04:57 +00:00
Simon Pilgrim 0a12c239b6 [X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB.
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.

llvm-svn: 316448
2017-10-24 15:38:16 +00:00
Simon Pilgrim c36dd6ae9c [X86] truncateVectorCompareWithPACKSS - remove duplicate variables. NFCI.
llvm-svn: 316440
2017-10-24 14:18:32 +00:00
Simon Pilgrim 321e54f72d [X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8
Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS

llvm-svn: 316383
2017-10-23 22:05:02 +00:00
Simon Pilgrim 1dcb913be6 [X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI.
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.

Differential Revision: https://reviews.llvm.org/D39169

llvm-svn: 316336
2017-10-23 16:00:57 +00:00
Craig Topper fcf27188d7 [X86] Do not generate __multi3 for mul i128 on X86
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function.  This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.

Patch by Riyaz V Puthiyapurayil

Reviewers: craig.topper, schweitz

Reviewed By: craig.topper, schweitz

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D38668

llvm-svn: 316254
2017-10-21 02:26:00 +00:00
Simon Pilgrim 29b32472b4 [X86][SSE] getTargetShuffleMask - check shuffle input value types. NFCI.
To help identify shuffle combine issues

llvm-svn: 316222
2017-10-20 18:07:50 +00:00
Craig Topper 7bce79a539 [X86] Remove LowerEXTRACT_SUBVECTOR handler. All EXTRACT_SUBVECTORs are marked as legal.
llvm-svn: 316182
2017-10-19 20:59:40 +00:00
Simon Pilgrim fdd63d1535 [X86] Replace custom scalar integer absolute matching with ISD::ABS lowering.
x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.

This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.

Additional test cases are already covered by iabs.ll (rL315706 and rL315711).

Differential Revision: https://reviews.llvm.org/D38895

llvm-svn: 316162
2017-10-19 15:02:24 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Craig Topper a5af4a64d0 [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering.
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.

There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.

Reviewers: RKSimon, zvi, delena

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38714

llvm-svn: 315860
2017-10-15 16:41:17 +00:00
Craig Topper a9cd59fb5d [X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions.
Summary:
It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register.

It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day.

Reviewers: RKSimon, delena, zvi

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38932

llvm-svn: 315849
2017-10-15 06:39:07 +00:00
Simon Pilgrim 36fe00ee17 [X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947)
llvm-svn: 315825
2017-10-14 19:57:19 +00:00
Simon Pilgrim f5b9f353c3 Pull out repeated calls to VT.getVectorNumElements(). NFCI.
llvm-svn: 315818
2017-10-14 17:37:42 +00:00
Simon Pilgrim cded82837d Use DAG::getBitcast() helper. NFCI.
llvm-svn: 315815
2017-10-14 17:14:42 +00:00
Simon Pilgrim f367c27d2d [X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X))
If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle.

Fixes the last issue with PR22415

llvm-svn: 315807
2017-10-14 15:01:36 +00:00
Craig Topper f6c69564e7 [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.

For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.

We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.

I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.

Differential Revision: https://reviews.llvm.org/D38836

llvm-svn: 315768
2017-10-13 21:56:48 +00:00