Commit Graph

5971 Commits

Author SHA1 Message Date
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Simon Pilgrim 85184017e9 [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle
As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section.

For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts).

Differential Revision: https://reviews.llvm.org/D56784

llvm-svn: 352883
2019-02-01 16:02:12 +00:00
Simon Pilgrim 1a529f58f9 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1)))
Enable peeking through one use bitcasts to the subvector shuffle.

This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns

llvm-svn: 352879
2019-02-01 15:31:01 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Simon Pilgrim 00cefe1158 Trim trailing whitespace. NFCI.
llvm-svn: 352775
2019-01-31 17:49:25 +00:00
Simon Pilgrim eb6aef6db3 [X86][AVX] Fold concat(broadcast(x),broadcast(x)) -> broadcast(x)
Differential Revision: https://reviews.llvm.org/D57514

llvm-svn: 352774
2019-01-31 17:48:35 +00:00
Simon Pilgrim d04a2d2d5e [X86][AVX] insert_subvector(bitcast(v), bitcast(s), c1) -> bitcast(insert_subvector(v,s,c2))
Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling.

Differential Revision: https://reviews.llvm.org/D57514

llvm-svn: 352773
2019-01-31 17:38:10 +00:00
Simon Pilgrim 63f3383ece [X86][AVX] Fold broadcast(bitcast(src)) -> bitcast(broadcast(src))
llvm-svn: 352751
2019-01-31 14:04:07 +00:00
Simon Pilgrim a001008a09 [X86] combineExtractWithShuffle - more aggressively peek through bitcasts
Fixes regression introduced by rL352743

llvm-svn: 352745
2019-01-31 11:55:30 +00:00
Simon Pilgrim b96a2c7fed [X86][AVX] Enable AVX1 broadcasts in shuffle combining
Enables 32/64-bit scalar load broadcasts on AVX1 targets

The extractelement-load.ll regression will be fixed shortly in a followup commit.

llvm-svn: 352743
2019-01-31 11:41:10 +00:00
Simon Pilgrim 51c2efc104 [X86][AVX] Fold vt1 concat_vectors(vt2 undef, vt2 broadcast(x)) --> vt1 broadcast(x)
If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast.

Avoids a regression when we enable AVX1 broadcasts in shuffle combining

llvm-svn: 352742
2019-01-31 11:15:05 +00:00
Craig Topper 8bdc203d4b [X86] Remove handling of ISD::INTRINSIC_WO_CHAIN in ReplaceNodeResults.
I believe this was there to handle avx512bw intrinsics that returned i64 type in 32-bit mode. But all those intrinsics have since been changed to v64i1 results or replaced with generic IR.

llvm-svn: 352698
2019-01-31 00:04:46 +00:00
Simon Pilgrim 317fad5921 [X86][AVX] Prefer to combine shuffle to broadcasts whenever possible
This is the first step towards improving broadcast support on AVX1 targets.

llvm-svn: 352634
2019-01-30 16:19:19 +00:00
Mikael Holmen b792627ce9 Fix compiler warning when using clang 3.6.0
Without the fix we get the following (with -Werror):

../lib/Target/X86/X86ISelLowering.cpp:14181:58: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
  SmallVector<std::array<int, 2>, 2> LaneSrcs(NumLanes, {-1, -1});
                                                         ^~~~~~
                                                         {     }
1 error generated.

llvm-svn: 352455
2019-01-29 06:51:28 +00:00
Craig Topper 390ac61b93 Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This did not cause the buildbot failure it was previously reverted for.

Original commit message:

I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

llvm-svn: 352433
2019-01-28 21:38:47 +00:00
Simon Pilgrim 2c17512456 [X86][AVX] Remove lowerShuffleByMerging128BitLanes 2-lane restriction
First step towards adding support for 64-bit unary "sublane" handling (a bit like lowerShuffleAsRepeatedMaskAndLanePermute). 

This allows us to add lowerV64I8Shuffle handling.

llvm-svn: 352389
2019-01-28 17:02:35 +00:00
Sanjay Patel 94cca60b82 [x86] allow more shuffle splitting to avoid vpermps (PR40434)
This is tricky to make optimal: sometimes we're better off using 
a single wider op, but other times it makes more sense to combine
a narrow ops to achieve the same result.

This solves the case from:
https://bugs.llvm.org/show_bug.cgi?id=40434

There's potentially a similar change for vectors with 64-bit elements,
but it needs adjustments similar to rL352333 to avoid creating infinite
loops.

llvm-svn: 352380
2019-01-28 15:51:34 +00:00
Craig Topper 453150bc18 [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument.
Remove and autoupgrade the old intrinsics

llvm-svn: 352343
2019-01-28 07:03:03 +00:00
Sanjay Patel ebe6b43aec [x86] add restriction for lowering to vpermps
This transform was added with rL351346, and we had
an escape for shufps, but we also want one for
unpckps vs. vpermps because vpermps doesn't take
an immediate shuffle index operand.

llvm-svn: 352333
2019-01-27 21:53:33 +00:00
Simon Pilgrim 670a6971f8 [X86][SSE] Add UNDEF handling to combineSelect ISD::USUBSAT matching (PR40083)
llvm-svn: 352330
2019-01-27 21:01:23 +00:00
Simon Pilgrim f10b6623cc [X86][SSE] Permit UNDEFs in combineAddToSUBUS matching (PR40083)
llvm-svn: 352328
2019-01-27 20:36:37 +00:00
Sanjay Patel 5f1fdaa192 [x86] refactor logic in lowerShuffleWithUndefHalf
Although this is longer code, this is no-functional-change-intended.
The goal is to untangle the conditions under which we bail out, so 
that's easier to adjust.

llvm-svn: 352320
2019-01-27 18:12:03 +00:00
Simon Pilgrim a914fa4dd8 [X86] combineAddOrSubToADCOrSBB/combineCarryThroughADD - use oneuse for entire SDNode
Fix issue noted in D57281 that only tested the one use for the SDValue (the result flag), not the entire SUB.

I've added the getNode() to make it clearer what is intended than just the -> redirection.

llvm-svn: 352291
2019-01-26 21:29:16 +00:00
Simon Pilgrim 37a8e65a60 [X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545)
As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction.

Differential Revision: https://reviews.llvm.org/D57281

llvm-svn: 352289
2019-01-26 20:23:04 +00:00
Simon Pilgrim b7a15acd38 [X86] Fold X86ISD::SBB(ISD::SUB(X,Y),0) -> X86ISD::SBB(X,Y) (PR25858)
We often generate X86ISD::SBB(X, 0) for carry flag arithmetic.

I had tried to create test cases for the ADC equivalent (which often uses the same pattern) but haven't managed to find anything yet.

Differential Revision: https://reviews.llvm.org/D57169

llvm-svn: 352288
2019-01-26 20:13:44 +00:00
Simon Pilgrim 6162fba57c [X86][SSE] Generalized unsigned compares to support nonsplat constant vectors (PR39859)
llvm-svn: 352283
2019-01-26 16:40:03 +00:00
Sanjay Patel a03c63b77f [x86] add helper for creating a half-width shuffle; NFC
This reduces a bit of duplication between the combining and
lowering places that use it, but the primary motivation is
to make it easier to rearrange the lowering logic and solve
PR40434:
https://bugs.llvm.org/show_bug.cgi?id=40434

llvm-svn: 352280
2019-01-26 16:20:22 +00:00
Craig Topper 58e6b37e62 Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This might be breaking an lldb windows buildbot.

llvm-svn: 352268
2019-01-26 02:44:58 +00:00
Craig Topper 7a8e74775c [X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest.
Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56971

llvm-svn: 352260
2019-01-26 01:17:09 +00:00
Craig Topper b1d3457c03 [SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57186

llvm-svn: 352255
2019-01-26 00:26:37 +00:00
Craig Topper 4cf28bad5b [X86] Combine masked store and truncate into masked truncating stores.
We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch.

This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate.

Differential Revision: https://reviews.llvm.org/D57218

llvm-svn: 352230
2019-01-25 18:37:36 +00:00
Sanjay Patel 0020f8bb23 [x86] simplify logic in lowerShuffleWithUndefHalf(); NFCI
This seems unnecessarily complicated because we gave names to
opposite polarity bools and have code comments that don't really
line up with the logic. 

Step 1: remove UndefUpper and assert that it is the opposite of 
UndefLower after the initial early exit.

llvm-svn: 352217
2019-01-25 17:00:41 +00:00
Simon Pilgrim f56298f4b9 [X86] Simplify X86ISD::ADD/SUB if we don't use the result flag
Simplify to the generic ISD::ADD/SUB if we don't make use of the result flag.

This mainly helps with ADDCARRY/SUBBORROW intrinsics which get expanded to X86ISD::ADD/SUB but could be simplified further.

Noticed in some of the test cases in PR31754

Differential Revision: https://reviews.llvm.org/D57234

llvm-svn: 352210
2019-01-25 15:58:28 +00:00
Sanjay Patel 21aa6ddc14 [x86] narrow a shuffle that doesn't use or set any high elements
This isn't the final fix for our reduction/horizontal codegen, but it takes care 
of a lot of the problems. After we narrow the shuffle, existing combines for 
insert/extract and binops kick in, and we end up with cheaper 128-bit ops.

The avg and mul reduction tests show an existing shuffle lowering hole for 
AVX2/AVX512. I think in its most minimal form this is:
https://bugs.llvm.org/show_bug.cgi?id=40434
...but we might need multiple fixes to get it right.

Differential Revision: https://reviews.llvm.org/D57156

llvm-svn: 352209
2019-01-25 15:37:42 +00:00
Sanjay Patel 4c304b2923 [x86] move half-size shuffle mask creation to helper; NFC
As noted in D57156, we want to check at least part of
this pattern earlier (in combining), so this will allow
the code to be shared instead of duplicated.

llvm-svn: 352127
2019-01-24 23:12:36 +00:00
Sanjay Patel e524639d72 [x86] rename VectorShuffle -> Shuffle; NFC
This wasn't consistent within the file, so made it harder to search.
Standardize on the shorter name to save some typing.

llvm-svn: 352077
2019-01-24 18:52:12 +00:00
Sanjay Patel e5a0bcf7b8 [x86] add low/high undef half shuffle mask helpers; NFC
This is the most common usage for isUndefInRange, 
so make the code slightly less duplicated and more 
readable.

llvm-svn: 352063
2019-01-24 17:05:02 +00:00
Matt Arsenault a5840c3c39 Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Simon Pilgrim 933673d878 [X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))
For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations.

This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns.

Differential Revision: https://reviews.llvm.org/D55935

llvm-svn: 351819
2019-01-22 13:44:49 +00:00
Craig Topper bcbdf61078 [X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit cvtpd2ps intrinsics.
Summary:
Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR.

In the process I found a couple duplicate isel patterns which I also deleted in this patch.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56991

llvm-svn: 351762
2019-01-21 20:14:09 +00:00
Craig Topper c2087d8f3f [X86] Change avx512 COMPRESS and EXPAND lowering to use a single masked node instead of expand/compress+select.
Summary:
For compress, a select node doesn't semantically reflect the behavior of the instruction. The mask would have holes in it, but the resulting write is to contiguous elements at the bottom of the vector.

Furthermore, as far as the compressing and expanding is concerned the behavior is depended on the mask. You can't just have an expand/compress node that only reads the input vector. That node would have no meaning by itself.

This all only works because we pattern match the compress/expand+select back to the instruction. But conceivably an optimization of the select could break the pattern and leave something meaningless.

This patch modifies the expand and compress node to take the mask and passthru as additional inputs and gets rid of the select all together.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57002

llvm-svn: 351761
2019-01-21 20:02:28 +00:00
Craig Topper 4aa74fff1f [X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps cvtuqq2ps nodes that produce less than 128-bits of results.
These nodes zero the upper half of the result and can't be represented with vselect.

llvm-svn: 351666
2019-01-19 21:26:20 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Reid Kleckner 38f9900aa5 [X86] Deduplicate static calling convention helpers for code size, NFC
Summary:
Right now we include ${TGT}GenCallingConv.inc once per each instruction
selection method implemented by ${TGT}:
- ${TGT}ISelLowering.cpp
- ${TGT}CallLowering.cpp
- ${TGT}FastISel.cpp

Instead, add a mechanism to tablegen for marking a particular convention
as "External", which causes tablegen to emit into the ::llvm namespace,
instead of as a static helper. This allows us to provide a header to
forward declare it, so we can simply call the function from all the
places it is referenced. Typically the calling convention analyzer is
called indirectly, so it doesn't benefit from inlining.

This saves a bit of final binary size, but mostly just saves object file
size:

before  after   diff   artifact
12852K  12492K  -360K  X86ISelLowering.cpp.obj
4640K   4280K   -360K  X86FastISel.cpp.obj
1704K   2092K   +388K  X86CallingConv.cpp.obj
52448K  52336K  -112K  llc.exe

I didn't collect before numbers for X86CallLowering.cpp.obj, which is
for GlobalISel, but we should save 360K there as well.

This patch applies the strategy to the X86 backend, but there is no
reason it couldn't be applied to the other backends that implement
multiple ISel strategies, like AArch64.

Reviewers: craig.topper, hfinkel, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56883

llvm-svn: 351616
2019-01-19 00:33:02 +00:00
Craig Topper 08d3d32ead [X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of going directly to MachineSDNode.
This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.

llvm-svn: 351583
2019-01-18 20:14:46 +00:00
Craig Topper b9d4461f9f [X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.:
This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.

Differential Revision: https://reviews.llvm.org/D56827

llvm-svn: 351570
2019-01-18 18:22:26 +00:00
Sanjay Patel b6c91a1a59 [x86] simplify code for SDValue.getOperand(); NFC
llvm-svn: 351557
2019-01-18 15:55:21 +00:00