Commit Graph

6556 Commits

Author SHA1 Message Date
Craig Topper 0fff1e4f3d [X86] Consistently use MVT::i8 for the constant operand of BLENDI and INSERTPS nodes.
This is the type listed in the type constraint for isel. But since
we list a type there, it doesn't get checked during isel matching.

llvm-svn: 367775
2019-08-04 06:01:31 +00:00
Sanjay Patel c9171bd0a9 [x86] change free truncate hook to handle only simple types (PR42880)
This avoids the crash from:
https://bugs.llvm.org/show_bug.cgi?id=42880
...and I think it's a proper constraint for the TLI hook.

But that example raises questions about what happens to get us
into this situation (created i29 types) and what happens later
(why does legalization die on those types), so I'm not sure if
we will resolve the bug based on this change.

llvm-svn: 367766
2019-08-03 21:46:27 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Craig Topper 45ea25289d [X86] Use the pointer VT for the Scale node when lowering x86 gather/scatter intrinsics.
This is consistent with the target independent intrinsic handling.

Not sure this really matters since we just pull the constant out
using getZExtValue later.

llvm-svn: 367736
2019-08-02 23:18:16 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Simon Pilgrim 63d4114f72 [X86][SSE] Add PEXTR*(PINSR*(v, s, c), c) -> s combine.
We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll.

llvm-svn: 367582
2019-08-01 16:38:39 +00:00
Simon Pilgrim 33f5f863b5 [X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handling
This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367570
2019-08-01 14:46:03 +00:00
Simon Pilgrim f99f9881e3 [X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846)
llvm-svn: 367556
2019-08-01 13:13:18 +00:00
Amy Huang 153f20057c Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and
and partial fix.
Causes windows buildbot errors.

This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and
53da7ca943.

llvm-svn: 367496
2019-07-31 23:59:31 +00:00
Craig Topper b51dc64063 [X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an extractelement+store
We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it.

This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk.

Differential Revision: https://reviews.llvm.org/D65538

llvm-svn: 367488
2019-07-31 22:43:08 +00:00
Simon Pilgrim 0707f66ad0 [X86] Moved IsNOT helper earlier. NFCI.
Makes it available for more combines to use without adding declarations.

llvm-svn: 367436
2019-07-31 14:36:04 +00:00
Simon Pilgrim 24ad2b5e7d [X86][AVX] Ensure chained subvector insertions are the same size (PR42833)
Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions.

llvm-svn: 367429
2019-07-31 12:55:39 +00:00
Amy Huang 53da7ca943 [MS] Emit S_HEAPALLOCSITE debug info in SelectionDAG
Summary: This emits labels around heapallocsite calls in SelectionDAG.

Reviewers: rnk

Subscribers: MatzeB, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61105

llvm-svn: 367374
2019-07-31 00:16:13 +00:00
Craig Topper 8b58371fae [X86] Fix mistake in comment. NFC
The code is matching sext not zext.

llvm-svn: 367357
2019-07-30 21:00:24 +00:00
Simon Pilgrim b989bc47c0 [X86] SimplifyDemandedVectorEltsForTargetNode should be calling resolveTargetShuffleInputs not getTargetShuffleMask
Add TODO comment.

llvm-svn: 367318
2019-07-30 15:06:09 +00:00
Simon Pilgrim e4d5423dcd [X86][AVX] SimplifyDemandedVectorElts - handle extraction from X86ISD::SUBV_BROADCAST source (PR42819)
PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source.

This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source. 

llvm-svn: 367306
2019-07-30 11:35:13 +00:00
Craig Topper 479b45411e [X86] Fix typo in comment. We're looking at a right shift not a left shift. NFC
llvm-svn: 367251
2019-07-29 19:22:51 +00:00
Simon Pilgrim 962c03fac4 [X86] resolveTargetShuffleInputs - add depth to limit recursion.
Avoids slow downs from calls to ComputeNumSignBits/computeKnownBits going too deep.

llvm-svn: 367240
2019-07-29 17:17:58 +00:00
Simon Pilgrim 5ab948f823 [X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.
As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth.

This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc.

llvm-svn: 367232
2019-07-29 15:57:06 +00:00
Craig Topper eb1beabad9 [X86] Don't use PMADDWD for vector add reductions of multiplies if the mul inputs have an additional user.
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.

I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.

There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.

llvm-svn: 367198
2019-07-29 01:36:58 +00:00
Craig Topper 894916cac9 [X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit.
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.

llvm-svn: 367195
2019-07-28 18:45:42 +00:00
Simon Pilgrim 353a848473 [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler (Reapplied)
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.

llvm-svn: 367172
2019-07-27 13:30:29 +00:00
Vlad Tsyrklevich 485b8789de Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler."
This reverts r367100, it appears to be causing test failures after
Nico's revert of r367091.

llvm-svn: 367141
2019-07-26 18:14:21 +00:00
Simon Pilgrim d93e8ece7b [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler.
This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits.

llvm-svn: 367100
2019-07-26 11:10:20 +00:00
Simon Pilgrim 447fe31964 [X86] concatSubVectors - remove unnecessary args. NFCI.
All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper.

llvm-svn: 367014
2019-07-25 13:05:46 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Simon Pilgrim 7d318b2bb1 [DAGCombine] matchBinOpReduction - add partial reduction matching
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Differential Revision: https://reviews.llvm.org/D65047

llvm-svn: 366933
2019-07-24 17:29:56 +00:00
Craig Topper 76bc3d6e07 [X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2.
LegalizeDAG tries to legal the DAG by legalizing nodes before
their operands.

If we create a new node, we end up legalizing it after its operands.
This prevents some of the optimizations that can be done when the
operand is a build_vector since the build_vector will have been
legalized to something else.

Differential Revision: https://reviews.llvm.org/D65132

llvm-svn: 366835
2019-07-23 18:46:15 +00:00
Craig Topper 510e6fadaa [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.

While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.

llvm-svn: 366732
2019-07-22 19:58:49 +00:00
Simon Pilgrim b3d719e1cf [X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Fixed out of bounds load assert identified in rL366501

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366681
2019-07-22 12:44:10 +00:00
Simon Pilgrim 86fa3270ef [X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI.
Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes.

llvm-svn: 366660
2019-07-21 19:04:44 +00:00
Simon Pilgrim adec0f2252 [X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)
As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result.

llvm-svn: 366636
2019-07-20 15:20:11 +00:00
Reid Kleckner ba9c9e62cb Revert [X86] EltsFromConsecutiveLoads - support common source loads
This reverts r366441 (git commit 48104ef7c9)

This causes clang to fail to compile some file in Skia. Reduction soon.

llvm-svn: 366501
2019-07-18 21:26:41 +00:00
Simon Pilgrim 48104ef7c9 [X86] EltsFromConsecutiveLoads - support common source loads
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366441
2019-07-18 14:33:25 +00:00
Craig Topper 8da0402210 [X86] Disable combineConcatVectors for vXi1 vectors.
I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.

llvm-svn: 366405
2019-07-18 06:18:06 +00:00
Craig Topper 61fff7a337 [X86] Make sure we mark 128/256 MLOAD as Legal with VLX when min-legal-vector-width=256 is in effect.
This started triggering an assertion after r364718 when we made
these Custom under AVX2.

llvm-svn: 366382
2019-07-17 22:26:00 +00:00
Sanjay Patel d746a210e1 [x86] use more phadd for reductions
This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023

There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.

Differential Revision: https://reviews.llvm.org/D64760

llvm-svn: 366268
2019-07-16 21:30:41 +00:00
Craig Topper c0b2ed664b [X86] In combineStore, don't convert v2f32 load/store pairs to f64 loads/stores.
Type legalization can take care of this. This gives DAG combine
a little more time with the original types.

llvm-svn: 366182
2019-07-16 05:52:27 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Sanjay Patel eb99165b97 [x86] try to keep FP casted+truncated+extracted vector element out of GPRs
inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0)

We have pseudo-vectorization of scalar int to FP casts, so this tries to
make that more likely by replacing a truncate with a bitcast. I didn't see
any test diffs starting from 'uitofp', so I left that as a TODO. We can't
only match the shorter trunc+extract pattern because there's an opposing
transform somewhere, so we infinite loop. Waiting to try this during
lowering is another possibility.

A motivating case is shown in PR39975 and included in the test diffs here:
https://bugs.llvm.org/show_bug.cgi?id=39975

Differential Revision: https://reviews.llvm.org/D64710

llvm-svn: 366098
2019-07-15 18:17:23 +00:00
Craig Topper 81971b2b79 [X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is out of range.
I think we only turn out of range shiftss to undef when
all elements are out of range or the shift amount is a splat out
of range. I'm not sure which, I didn't check.

During lowering we can split a shift where some elements
are out of range into multiple shifts. This can create a
new shift with a splat shift amount that is out of range.

This patch returns undef for this case.

Fixes PR42615.

Differential Revision: https://reviews.llvm.org/D64699

llvm-svn: 366096
2019-07-15 17:56:57 +00:00
Simon Pilgrim 60fb5e97a0 [X86] isTargetShuffleEquivalent - assert the expected mask is correctly formed. NFCI.
While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values.

llvm-svn: 366066
2019-07-15 14:29:14 +00:00
Craig Topper 635d103e0b [X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only.
Summary:
SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory.

This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense.

I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load.

If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64528

llvm-svn: 366034
2019-07-15 02:02:31 +00:00
Sanjay Patel 2097f75eab [x86] simplify cmov with same true/false operands
llvm-svn: 365998
2019-07-13 12:04:52 +00:00
Sanjay Patel 5cc7c9ab93 [X86] Merge negated ISD::SUB nodes into X86ISD::SUB equivalent (PR40483)
Follow up to D58597, where it was noted that the commuted ISD::SUB variant
was having problems with lack of combines.

See also D63958 where we untangled setcc/sub pairs.

Differential Revision: https://reviews.llvm.org/D58875

llvm-svn: 365791
2019-07-11 15:56:33 +00:00
Craig Topper 021ba49b31 [X86] Remove unused variable. NFC
llvm-svn: 365697
2019-07-10 21:01:34 +00:00
Simon Pilgrim 5dd2af5248 [X86] EltsFromConsecutiveLoads - clean up element size calcs. NFCI.
Determine the element/load size calculations earlier and assert that they are whole bytes in size.

llvm-svn: 365674
2019-07-10 17:49:27 +00:00
Simon Pilgrim 093f4aa72f [X86] EltsFromConsecutiveLoads - remove duplicate check for element size. NFCI.
We've already checked that each element is the correct contributory size for VT when we inspect the elements for Undef/Zero/Load.

llvm-svn: 365656
2019-07-10 16:22:31 +00:00
Simon Pilgrim 893448a3e4 [X86] EltsFromConsecutiveLoads - ensure element reg/store sizes are the same size. NFCI.
This renames the type so it doesn't sound like its based off the load size - as we're moving towards supporting combining loads of different sizes.

llvm-svn: 365655
2019-07-10 16:14:26 +00:00
Simon Pilgrim 0a9479ef39 [X86] EltsFromConsecutiveLoads - cleanup Zero/Undef/Load element collection. NFCI.
llvm-svn: 365628
2019-07-10 13:28:13 +00:00
Simon Pilgrim ef1aac3191 [X86] EltsFromConsecutiveLoads - LDBase is non-null. NFCI.
Don't bother checking for LDBase != null - it should be (and we assert that it is).

llvm-svn: 365622
2019-07-10 12:22:59 +00:00
Simon Pilgrim c972193583 [X86] EltsFromConsecutiveLoads - store Loads on a per-element basis. NFCI.
Cache the LoadSDNode nodes so we can easily map to/from the element index instead of packing them together - this will be useful for future patches for PR16739 etc.

llvm-svn: 365620
2019-07-10 11:26:57 +00:00
Simon Pilgrim 6a58583951 [X86][SSE] EltsFromConsecutiveLoads - add basic dereferenceable support
This patch checks to see if the vector element loads are based off a dereferenceable pointer that covers the entire vector width, in which case we don't need to have element loads at both extremes of the vector width - just the start (base pointer) of it.

Another step towards partial vector loads......

Differential Revision: https://reviews.llvm.org/D64205

llvm-svn: 365614
2019-07-10 10:46:36 +00:00
Craig Topper 50f70de557 [X86] Limit getTargetConstantFromNode to only work on NormalLoads not extending loads.
This seems to fix a failure reported by Jordan Rupprecht, but we
don't have a reduced test case yet.

llvm-svn: 365589
2019-07-10 00:40:01 +00:00
Craig Topper 1ae60797cd [X86] Don't form extloads in combineExtInVec unless the load extension is legal.
This should prevent doing this on pre-sse4.1 targets or for 256
bit vectors without avx2.

I don't know of a failure from this. Op legalization will probably
take care of, but seemed better to be safe.

llvm-svn: 365577
2019-07-09 23:05:54 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Simon Pilgrim 294f37561a [X86] LowerToHorizontalOp - use count_if to count non-UNDEF ops. NFCI.
llvm-svn: 365540
2019-07-09 19:19:17 +00:00
Reid Kleckner 2f07c2e9d9 Standardize on MSVC behavior for triples with no environment
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.

Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.

Addresses PR42491

Reviewers: compnerd

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64109

llvm-svn: 365387
2019-07-08 21:05:20 +00:00
Simon Pilgrim e1a9b49d6b [X86] ISD::INSERT_SUBVECTOR - use uint64_t index. NFCI.
Keep the uint64_t type from getConstantOperandVal to stop truncation/extension overflow warnings in MSVC in subvector index math.

llvm-svn: 365328
2019-07-08 14:52:56 +00:00
Simon Pilgrim a7145c45a7 [X86] SimplifyDemandedVectorEltsForTargetNode - fix shadow variable warning. NFCI.
Fixes cppcheck warning.

llvm-svn: 365271
2019-07-06 18:46:09 +00:00
Simon Pilgrim 01f1bad618 [X86] LowerBuildVectorv16i8 - pull out repeated getOperand() call. NFCI.
llvm-svn: 365270
2019-07-06 18:33:29 +00:00
Simon Pilgrim 8b25d9bf01 [X86][SSE] LowerINSERT_VECTOR_ELT - early out for out of range indices
Fixes OSS-Fuzz #15662

llvm-svn: 365180
2019-07-05 10:34:53 +00:00
Simon Pilgrim fde766de4b [X86][AVX1] Combine concat_vectors(pshufd(x,c),pshufd(y,c)) -> vpermilps(concat_vectors(x,y),c)
Bitcast v4i32 to v8f32 and back again - it might be worth adding isel patterns for X86PShufd v8i32 on AVX1 targets like we did for X86Blendi to avoid the bitcasts?

llvm-svn: 365125
2019-07-04 10:17:10 +00:00
Craig Topper 163b8bb3f5 [X86] Use pointer sized indices instead of i32 for EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT in a couple places.
Most places already did this.

llvm-svn: 365109
2019-07-04 06:21:54 +00:00
Simon Pilgrim 26812c7675 [X86] ComputeNumSignBitsForTargetNode - add target shuffle support.
llvm-svn: 365057
2019-07-03 17:06:59 +00:00
Simon Pilgrim 783dbe402f [X86][AVX] combineX86ShufflesRecursively - peek through extract_subvector
If we have more then 2 shuffle ops to combine, try to use combineX86ShuffleChainWithExtract to see if some are from the same super vector.

llvm-svn: 365050
2019-07-03 15:46:08 +00:00
Simon Pilgrim 868d0b7fd9 [X86][AVX] Combine vpermi(bitcast(x)) -> bitcast(vpermi(x))
iff the number of elements doesn't change.

This gets around an issue with combineX86ShuffleChain not being able to hint which domain is preferred for shuffles that can be done with either.

Fixes regression introduced in rL365041

llvm-svn: 365044
2019-07-03 14:34:16 +00:00
Simon Pilgrim 0c230209fe [X86][AVX] combineX86ShuffleChainWithExtract - add number of non-zero extract_subvectors to the combine depth
This better accounts for the cost/benefit of removing extract_subvectors from the shuffle and will be more useful in future patches.

The vpermq predicate regression will be fixed shortly.

llvm-svn: 365041
2019-07-03 14:17:21 +00:00
Simon Pilgrim 8c099cbe7c [X86][SSE] lowerUINT_TO_FP_v2i32 - explicitly cast half word to double
Fixes MSVC analyzer extension->double warning.

llvm-svn: 365027
2019-07-03 11:23:27 +00:00
Simon Pilgrim 8df90b843d [X86][SSE] LowerINSERT_VECTOR_ELT - ensure insertion index correctness. NFCI.
Assert that the insertion index is in range and use uint64_t for the index to fix MSVC/cppcheck truncation warning.

llvm-svn: 365025
2019-07-03 10:59:52 +00:00
Simon Pilgrim 8853bd9592 [X86][SSE] LowerScalarImmediateShift - ensure shift amount correctness. NFCI.
Assert that the shift amount is in range and create vXi8 shift masks in a way that doesn't cause MSVC/cppcheck shift result is truncated then extended warnings.

llvm-svn: 365024
2019-07-03 10:47:33 +00:00
Simon Pilgrim 64e3a51534 Fix uninitialized variable warnings. NFCI.
Both MSVC and cppcheck don't like the fact that the variables are initialized via references.

llvm-svn: 365018
2019-07-03 10:22:08 +00:00
Simon Pilgrim 7b7b9b78a2 [X86] LowerFunnelShift - use modulo constant shift amount.
This avoids the use of getZExtValue and uses the modulo shift amount which is whats expected for funnel shifts anyhow. 

llvm-svn: 365016
2019-07-03 10:04:16 +00:00
Craig Topper b770d2c9d4 [X86] Add a DAG combine for turning *_extend_vector_inreg+load into an appropriate extload if the load isn't volatile.
Remove the corresponding isel patterns that did the same thing without checking for volatile.

This fixes another variation of PR42079

llvm-svn: 364977
2019-07-02 23:20:03 +00:00
Simon Pilgrim 5613874947 [X86] getTargetConstantBitsFromNode - remove unnecessary getZExtValue() (PR42486)
Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits.

In this case it was completely superfluous as we'd moved the rest of the code to always use APInt.

Fixes the <1 x i128> addition bug in PR42486

llvm-svn: 364953
2019-07-02 18:20:38 +00:00
Simon Pilgrim 9304168103 [X86][AVX] combineX86ShuffleChain - pull out CombineShuffleWithExtract lambda. NFCI.
Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch.

llvm-svn: 364924
2019-07-02 13:30:04 +00:00
Simon Pilgrim d609ebb779 [X86] resolveTargetShuffleInputsAndMask - add repeated input handling.
We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask.

llvm-svn: 364906
2019-07-02 10:53:17 +00:00
Craig Topper 5e7815b695 [X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patterns
These instructions only read 64-bits of memory so we shouldn't
allow a full vector width load to be pattern matched in case it
is marked volatile.

Instead allow vzload or scalar_to_vector+load.

Also add a DAG combine to turn full vector loads into vzload when
used by one of these instructions if the load isn't volatile.

This fixes another case for PR42079

llvm-svn: 364838
2019-07-01 19:01:37 +00:00
Simon Pilgrim e3e38cce4a [X86] Add widenSubVector to size in bits helper. NFCI.
We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size.

This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources.

llvm-svn: 364803
2019-07-01 16:20:47 +00:00
Simon Pilgrim 172fe5dd19 [X86] CombineShuffleWithExtract - updated description comments. NFCI.
CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size.

llvm-svn: 364745
2019-07-01 11:33:45 +00:00
Craig Topper 4ca81a9b99 [X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload.
But only when the load isn't volatile.

This improves load folding during isel where we only have vzload
and scalar_to_vector+load patterns. We can't have full vector load
isel patterns for the same volatile load issue.

Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns.

llvm-svn: 364728
2019-07-01 07:09:31 +00:00
Craig Topper 725a8a5dc4 [X86] Custom lower AVX masked loads to masked load and vselect instead of selecting a maskmov+vblend during isel.
AVX masked loads only support 0 as the value for masked off elements.
So we need an extra blend to support other values. Previously we
expanded the masked load to two instructions with isel patterns.
With this patch we now insert the vselect during lowering and it
will be separately selected as a blend.

llvm-svn: 364718
2019-06-30 06:46:37 +00:00
Simon Pilgrim 978a08c885 [X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain
llvm-svn: 364667
2019-06-28 17:57:32 +00:00
Simon Pilgrim a54e1a0f01 [X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR
We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be.

Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all).

llvm-svn: 364644
2019-06-28 12:24:49 +00:00
Craig Topper cbb88a5169 [X86] Connect the output chain properly when combining vzext_movl+load into vzext_load.
llvm-svn: 364625
2019-06-28 06:58:50 +00:00
Sanjay Patel a95ca2b5ff [x86] prevent crashing from select narrowing with AVX512
llvm-svn: 364585
2019-06-27 20:16:58 +00:00
Simon Pilgrim 1fd1c60979 [X86] combineX86ShufflesRecursively - merge shuffles with more than 2 inputs
We already had the infrastructure for this, but were waiting for the fix for a number of regressions which were handled by the recent shuffle(extract_subvector(),extract_subvector()) -> extract_subvector(shuffle()) shuffle combines

llvm-svn: 364569
2019-06-27 17:30:51 +00:00
Simon Pilgrim e9a2f4fe2c Use getConstantOperandAPInt instead of getConstantOperandVal for comparisons.
getConstantOperandAPInt avoids any large integer issues - these are unlikely but the fuzzers do like to mess around.....

llvm-svn: 364564
2019-06-27 16:46:00 +00:00
Simon Pilgrim 74343eba37 [X86] getTargetVShiftByConstNode - reduce variable scope. NFCI.
Fixes cppcheck warning.

llvm-svn: 364561
2019-06-27 16:33:44 +00:00
Simon Pilgrim c5cff5d3d1 [X86] getFauxShuffle - add DemandedElts as a filter
This is currently benign but will be used in the future based on the elements referenced by the parent shuffle(s).

llvm-svn: 364530
2019-06-27 12:35:52 +00:00
Simon Pilgrim 90e121fbe6 [X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X)
If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ.

llvm-svn: 364518
2019-06-27 11:16:03 +00:00
Djordje Todorovic 7eeeb5947e [ISEL][X86] Tracking of registers that forward call arguments
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.

([9/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D60715

llvm-svn: 364516
2019-06-27 10:51:15 +00:00
Mikael Holmen 7b81b61368 Silence gcc warning after r364458
Without the fix gcc 7.4.0 complains with

../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)':
../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
             int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs));
                        ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

llvm-svn: 364507
2019-06-27 08:16:18 +00:00
Craig Topper 3d12971e1c [X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements
This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16.

I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load.

Differential Revision: https://reviews.llvm.org/D63702

llvm-svn: 364469
2019-06-26 20:16:19 +00:00
Craig Topper afa58b6ba1 [X86] Remove isTypePromotionOfi1ZeroUpBits and its helpers.
This was trying to optimize concat_vectors with zero of setcc or
kand instructions. But I think it produced the same code we
produce for a concat_vectors with 0 even it it doesn't come from
one of those operations.

llvm-svn: 364463
2019-06-26 19:45:48 +00:00
Simon Pilgrim dfe079ffbf [X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits
Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended.

Fixes PR41545

llvm-svn: 364458
2019-06-26 18:21:26 +00:00
Simon Pilgrim 435ee9fb1f [X86][SSE] X86TargetLowering::isCommutativeBinOp - add PMULDQ
Allows narrowInsertExtractVectorBinOp to reduce vector size instead of the more restricted SimplifyDemandedVectorEltsForTargetNode

llvm-svn: 364434
2019-06-26 14:58:11 +00:00
Simon Pilgrim 6b687bf681 [X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ
Allows narrowInsertExtractVectorBinOp to reduce vector size

llvm-svn: 364432
2019-06-26 14:40:49 +00:00
Simon Pilgrim b13c6f1a9d [X86][SSE] X86TargetLowering::isBinOp - add PCMPGT
Allows narrowInsertExtractVectorBinOp to reduce vector size

llvm-svn: 364431
2019-06-26 14:34:41 +00:00
Simon Pilgrim 24f96a0eee [X86] shouldScalarizeBinop - never scalarize target opcodes.
We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to).

llvm-svn: 364429
2019-06-26 14:21:29 +00:00
Hans Wennborg 6876de90e8 Fix the build after r364401
It was failing with:

/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66:
error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous
     scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask);
                                                                  ^
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are:
In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14:
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15:
note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int]
   ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) {
               ^
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37:
note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int]
   template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) {
                                     ^

llvm-svn: 364414
2019-06-26 11:56:38 +00:00
Simon Pilgrim c0711af7f9 [X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support
Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all.....

llvm-svn: 364407
2019-06-26 11:21:09 +00:00
Simon Pilgrim 3845a4f849 [X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles
truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts.

This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type.

llvm-svn: 364401
2019-06-26 09:50:11 +00:00
Craig Topper 14ea14ae85 [X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load
We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile.

Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so.

Differential Revision: https://reviews.llvm.org/D63665

llvm-svn: 364333
2019-06-25 17:08:26 +00:00
Simon Pilgrim aae4b68703 [X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO.
lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG.

llvm-svn: 364313
2019-06-25 13:36:53 +00:00
Craig Topper 7fccb2ac5e [X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building.
In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined.

In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary.

Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case.

Differential Revision: https://reviews.llvm.org/D63700

llvm-svn: 364207
2019-06-24 17:28:41 +00:00
Craig Topper 033774e144 [X86] Cleanups and safety checks around the isFNEG
This patch does a few things to start cleaning up the isFNEG function.

-Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value.
-Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64.
-Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type.
-Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now.

Differential Revision: https://reviews.llvm.org/D63683

llvm-svn: 364206
2019-06-24 17:28:26 +00:00
Craig Topper e8da65c698 [X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw.
Ideally we'd be able to represent this truncate as a any_extend to
v16i32 and a truncate, but SelectionDAG doens't know how to not
fold those together.

We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate,
but we aren't able to simultaneously fold the load and the store
from the isel pattern. By pulling the truncate into the store we
can successfully hide it from the DAG combiner. Then we can isel
pattern match the truncstore and load+any_extend separately.

llvm-svn: 364163
2019-06-23 23:51:21 +00:00
Simon Pilgrim a962c1bc0f [X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0))
llvm-svn: 364136
2019-06-22 17:57:01 +00:00
Craig Topper 4649a051bf [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Differential Revision: https://reviews.llvm.org/D63512

llvm-svn: 364095
2019-06-21 19:10:21 +00:00
Craig Topper 4569cdbcf5 [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.
We don't have any Custom handling during type legalization. Only
operation legalization.

Fixes PR42355

llvm-svn: 364093
2019-06-21 18:50:00 +00:00
Craig Topper ce6c06dfdd [X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults.
This should be unreachable, but bugs can make it reachable. This
adds a debug print so we can see the bad node in the output when
the llvm_unreachable triggers.

llvm-svn: 364091
2019-06-21 18:49:21 +00:00
Simon Pilgrim 5dba4ed208 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle
Subvector shuffling often ends up as insert/extract subvector.

llvm-svn: 364090
2019-06-21 18:35:04 +00:00
Simon Pilgrim 96e77ce626 [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.
TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate.

llvm-svn: 364072
2019-06-21 16:23:28 +00:00
Simon Pilgrim 36a999ffb8 [X86] X86ISD::ANDNP is a (non-commutative) binop
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.

llvm-svn: 364038
2019-06-21 12:42:39 +00:00
Simon Pilgrim 9184b009cf [X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI.
llvm-svn: 364030
2019-06-21 11:25:06 +00:00
Simon Pilgrim c26b8f2afc [X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1)
llvm-svn: 364026
2019-06-21 11:13:15 +00:00
Simon Pilgrim b5733581c4 [X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI.
Use the isConstOrConstSplat helper instead of inspecting the build vector manually.

llvm-svn: 364024
2019-06-21 10:54:30 +00:00
Simon Pilgrim 771c33e375 [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern
llvm-svn: 364022
2019-06-21 10:44:15 +00:00
Fangrui Song dc8de6037c Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC
llvm-svn: 364006
2019-06-21 05:40:31 +00:00
Simon Pilgrim a4d705e0ef [X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well.
llvm-svn: 363922
2019-06-20 11:31:54 +00:00
Sanjay Patel b5640b6fe8 [x86] avoid vector load narrowing with extracted store uses (PR42305)
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.

Differential Revision: https://reviews.llvm.org/D63517

llvm-svn: 363853
2019-06-19 18:13:47 +00:00
Simon Pilgrim 0018b78ef6 [X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI.
So I don't forget - there's a load of yak shaving to do first.

llvm-svn: 363847
2019-06-19 17:42:37 +00:00
Simon Pilgrim 34279db355 [X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG.
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.

llvm-svn: 363841
2019-06-19 17:21:15 +00:00
Simon Pilgrim cdc0236e3a [X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned bool flag. NFCI.
Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag.

llvm-svn: 363818
2019-06-19 15:18:24 +00:00
Simon Pilgrim d4754cac89 [X86] Add *_EXTEND -> *_EXTEND_VECTOR_INREG opcode conversion helper. NFCI.
Given a *_EXTEND or *_EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG.

llvm-svn: 363812
2019-06-19 14:54:02 +00:00
Simon Pilgrim 2b309027ed [X86] Merge extract_subvector(*_EXTEND) and extract_subvector(*_EXTEND_VECTOR_INREG) handling. NFCI.
llvm-svn: 363808
2019-06-19 14:25:27 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Craig Topper 10e6128c62 [X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC
FP_ROUND defaults to Legal for all MVT types and nothing changes
the v4f32 entry way from this default. If we needed this line
we'd also need one for v8f32 with AVX512 which we don't have.

llvm-svn: 363719
2019-06-18 19:04:03 +00:00
Simon Pilgrim 9c8593934a [X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x)
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.

llvm-svn: 363693
2019-06-18 15:30:50 +00:00
Craig Topper 0e18300802 [X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC
Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code.

llvm-svn: 363629
2019-06-17 23:08:09 +00:00
Simon Pilgrim 835999e48a [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).

llvm-svn: 363592
2019-06-17 18:20:04 +00:00
Simon Pilgrim bb9adfdb4e [X86][AVX] Split under-aligned vector nt-stores.
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.

llvm-svn: 363582
2019-06-17 17:22:38 +00:00
Simon Pilgrim 12cb792d7f [X86] combineLoad - begun making the load split code more generic. NFCI.
This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.

This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.

llvm-svn: 363570
2019-06-17 15:54:36 +00:00
Simon Pilgrim 454e6b9010 [X86][SSE] Prevent misaligned non-temporal vector load/store combines
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.

First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.

Differential Revision: https://reviews.llvm.org/D63246

llvm-svn: 363564
2019-06-17 14:26:10 +00:00
Sanjay Patel d14389c0a5 [x86] split 256-bit vector selects if operands are vector concats
This is similar logic/motivation to the select splitting in D62969.

In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.

The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.

In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).

Differential Revision: https://reviews.llvm.org/D63364

llvm-svn: 363508
2019-06-16 14:04:49 +00:00
Simon Pilgrim fcffc2facc [X86] CombineShuffleWithExtract - handle cases with different vector extract sources
Insert the shorter vector source into an undef vector of the longer vector source's type.

llvm-svn: 363507
2019-06-16 08:00:41 +00:00
Simon Pilgrim 456ca5d7f7 [X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI.
llvm-svn: 363501
2019-06-15 19:12:44 +00:00
Simon Pilgrim 90e87af303 [X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.

Fixes PR34380

llvm-svn: 363500
2019-06-15 18:30:43 +00:00
Simon Pilgrim 990f3ceb67 [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)

llvm-svn: 363499
2019-06-15 17:05:24 +00:00
Simon Pilgrim 757a2f13fd [X86] Use fresh MemOps when emitting VAARG64
Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on.

Committed on behalf of @luke (Luke Lau)

Differential Revision: https://reviews.llvm.org/D62726

llvm-svn: 363268
2019-06-13 14:05:37 +00:00
Simon Pilgrim 0baf136a4d [X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases.
Based on fuzz test from @craig.topper

llvm-svn: 363251
2019-06-13 11:26:21 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Simon Pilgrim 5b0e0dd709 [X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c)
Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1).

An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit.

llvm-svn: 363178
2019-06-12 16:38:20 +00:00
Simon Pilgrim 266f43964e [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.

llvm-svn: 363048
2019-06-11 11:00:23 +00:00
Craig Topper 9000a72a4b [X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its.
Summary:
Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes.

This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix.

Reviewers: RKSimon, spatel, xbolva00

Reviewed By: xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63032

llvm-svn: 362920
2019-06-10 04:50:12 +00:00
Craig Topper ceb807bbbc [X86] Disable f32->f64 extload when sse2 is enabled
Summary:
We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize.

This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62710

llvm-svn: 362919
2019-06-10 04:37:16 +00:00
Sanjay Patel 6880bceda2 [x86] narrow extract subvector of vector select
This is a potentially large perf win for AVX1 targets because of the way we
auto-vectorize to 256-bit but then expect the backend to legalize/optimize
for the half-implemented AVX1 ISA.

On the motivating example from PR37428 (even though this patch doesn't solve
the vector shift issue):
https://bugs.llvm.org/show_bug.cgi?id=37428
...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell)
because we eliminate the remaining 256-bit vblendv ops.

I added comments on a couple of tests that require further work. If we have
256-bit logic ops separating the vselect and extract, we should probably narrow
everything to 128-bit, but that requires a larger pattern match.

Differential Revision: https://reviews.llvm.org/D62969

llvm-svn: 362797
2019-06-07 13:17:46 +00:00
Craig Topper 9226ba6b37 [X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes.
This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.

llvm-svn: 362675
2019-06-06 05:41:27 +00:00
Sanjay Patel 2bf82879bd [x86] split more 256-bit stores of concatenated vectors
As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.

llvm-svn: 362620
2019-06-05 16:40:57 +00:00
Simon Pilgrim de586bd1fd [X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.
Enables us to use this to split 512-bit vectors in future patches.

llvm-svn: 362617
2019-06-05 16:14:14 +00:00
Simon Pilgrim 886a55eaa0 [X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y))
We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.

This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.

llvm-svn: 362599
2019-06-05 12:56:53 +00:00
Craig Topper 78fdce25a1 [X86] Cleanup convertIntLogicToFPLogic a little. NFCI
-Use early returns to reduce indentation
-Replace multipe ifs with a switch.
-Replace an assert with an llvm_unreachable default in the switch.
-Check that the FP type we're going to use for the
 X86ISD::FAND/FOR/FXOR is legal rather than checking that the
 integer type matches the width of a legal scalar fp type. This all
 runs after legalization so it shouldn't really matter, but making
 sure we're using a valid type in the X86ISD node is really
 whats important.

llvm-svn: 362565
2019-06-05 01:00:34 +00:00
Benjamin Kramer 03ff1b3c30 [X86] Fold single-use variable into assert. NFC.
Avoids an unused variable warning in Release builds.

llvm-svn: 362534
2019-06-04 18:01:07 +00:00
Sanjay Patel 606eb2367f [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524
2019-06-04 16:40:04 +00:00
Simon Pilgrim a6e289e9f8 [X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI.
As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher.

llvm-svn: 362504
2019-06-04 15:02:33 +00:00
Simon Pilgrim 71a39bcf68 [X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
2019-06-02 15:47:49 +00:00
Simon Pilgrim 7a869e7036 [DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.

Differential Revision: https://reviews.llvm.org/D59188

llvm-svn: 362324
2019-06-02 14:42:11 +00:00
Pengfei Wang 2e67d0c842 [X86] Add VP2INTERSECT instructions
Support Intel AVX512 VP2INTERSECT instructions in llvm

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62366

llvm-svn: 362188
2019-05-31 02:50:41 +00:00
Craig Topper d6b74cc859 [X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC
The LoadExt table defaults to all combinations being Legal. For
vector types, only src VTs with an i1 element type were ever changed.
So we don't need to mark them legal manually.

llvm-svn: 362170
2019-05-30 22:29:06 +00:00
Simon Pilgrim 32aac1727a [X86][SSE] Improve bool vector extload (PR26091)
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.

Differential Revision: https://reviews.llvm.org/D62449

llvm-svn: 362081
2019-05-30 10:25:20 +00:00
Pengfei Wang 1f67d94279 [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62281

llvm-svn: 362053
2019-05-30 03:59:16 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Sanjay Patel f7980e727f Revert "[x86] split 256-bit store of concatenated vectors"
This reverts commit d5a8637072.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

llvm-svn: 361850
2019-05-28 17:37:58 +00:00
Sanjay Patel d5a8637072 [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 361822
2019-05-28 13:54:17 +00:00
Sanjay Patel 6bf4ca9d2e [x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."

Differential Revision: https://reviews.llvm.org/D62506

llvm-svn: 361815
2019-05-28 12:58:07 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Simon Pilgrim a044410f37 [X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel

llvm-svn: 361734
2019-05-26 16:00:35 +00:00
Simon Pilgrim 58a8541dcc [X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.

There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.

llvm-svn: 361716
2019-05-26 10:54:23 +00:00
Simon Pilgrim 40fa52b174 [X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C)
Commonly occurs in sign-extension cases

llvm-svn: 361706
2019-05-25 18:02:17 +00:00
Nikita Popov d87eceda0e [X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.

Differential Revision: https://reviews.llvm.org/D62448

llvm-svn: 361704
2019-05-25 16:44:29 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Nikita Popov 15df05152d [X86] Don't compare i128 through vector if construction not cheap (PR41971)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).

Differential Revision: https://reviews.llvm.org/D62220

llvm-svn: 361352
2019-05-22 06:47:06 +00:00
Craig Topper ed6df47bae [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC
We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again.

llvm-svn: 361288
2019-05-21 19:03:45 +00:00
Simon Pilgrim 4b82e50315 [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692

llvm-svn: 361270
2019-05-21 15:20:24 +00:00
Craig Topper 3164b50af7 [X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC
llvm-svn: 361108
2019-05-19 01:01:46 +00:00
Simon Pilgrim 065431c82b [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)
Helps to improve folding of comparisons with movmsk results.

llvm-svn: 361056
2019-05-17 17:56:25 +00:00
Simon Pilgrim 2c2f8e74b9 [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. 

llvm-svn: 361052
2019-05-17 17:25:55 +00:00
Simon Pilgrim 279314e81b [X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.
We can now rely on generic expansion to handle this.

llvm-svn: 361038
2019-05-17 14:37:19 +00:00
Simon Pilgrim 62c7032c18 [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.

llvm-svn: 361035
2019-05-17 14:04:56 +00:00
Simon Pilgrim a6d3bd486b [X86] Pull out IsNOT helper. NFCI.
Return the input value for the NOT pattern: (xor X, -1) -> X

llvm-svn: 361012
2019-05-17 10:37:08 +00:00
Reid Kleckner 08c15df29f [X86] Deduplicate symbol lowering logic, NFC
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)

Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed

I think handling them all in one place will make the code easier to
maintain in the future.

Reviewers: craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61690

llvm-svn: 360952
2019-05-16 23:15:26 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Craig Topper 384d46c0d5 [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.

This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.

llvm-svn: 360747
2019-05-15 04:15:46 +00:00
Philip Reames 658cad1287 [NFC] Reuse a helper function to eliminate duplicate code
llvm-svn: 360740
2019-05-15 01:39:07 +00:00
Philip Reames 445f942fc4 Use an offset from TOS for idempotent rmw locked op lowering
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.

Differential Revision: https://reviews.llvm.org/D61862

llvm-svn: 360719
2019-05-14 22:32:42 +00:00
Simon Pilgrim c2d9cfd925 [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Differential Revision: https://reviews.llvm.org/D61830

llvm-svn: 360684
2019-05-14 15:21:28 +00:00
Simon Pilgrim 2747ee2c83 [X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI.
Fixes scan-build warnings

llvm-svn: 360664
2019-05-14 11:30:39 +00:00
Philip Reames 3098e44daa [X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.

Differential Revision: https://reviews.llvm.org/D61863

llvm-svn: 360649
2019-05-14 04:43:37 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Craig Topper e2966473dd [X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue().
Returning SDValue() makes the caller think that nothing happened and it will
end up executing the Expand path. This generates extra nodes that will need to
be pruned as dead code.

Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a
change and it will take care of replacing uses. This will prevent falling into
the Expand path.

llvm-svn: 360627
2019-05-13 22:17:13 +00:00
Craig Topper 5f999c2bea [X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence.
These are updates to match how isel table would emit a LOCK_OR32mi8 node.

-Use i32 for the immediate zero even though only 8 bits are encoded.
-Use i16 for segment register.
-Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match
64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The
only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1.
-Emit an extra i32 result for the flags output.

I don't know if the types here really matter just noticed it was inconsistent
with normal behavior.

llvm-svn: 360619
2019-05-13 21:01:24 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim 73aee29095 [X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360596
2019-05-13 16:10:11 +00:00
Simon Pilgrim cf5a8eb7cd [X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433)
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.

This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360594
2019-05-13 16:02:45 +00:00
Simon Pilgrim d9aa928603 [X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709)
Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway.

llvm-svn: 360588
2019-05-13 15:31:27 +00:00
Simon Pilgrim a7fc763082 [X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded.
Removes unnecessary vzeroupper noted in D61806

llvm-svn: 360543
2019-05-12 15:16:29 +00:00
Simon Pilgrim fda6bffd3b [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well.
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.

This helps us with target shuffles and hops in particular.

llvm-svn: 360535
2019-05-11 21:35:50 +00:00
Simon Pilgrim e4c5b6d9bd [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.
Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts

llvm-svn: 360526
2019-05-11 16:07:12 +00:00