Commit Graph

6845 Commits

Author SHA1 Message Date
Kadir Cetinkaya d2b6ac6ccd
Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places."
This reverts commit 8413116bf1.

this seems to be causing crashes while compiling ncurses.
```
$ ./bin/llc bugpoint-reduced-simplified.ll
LLVM ERROR: Cannot emit physreg copy instruction
```

Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e

executing with an llc compiled at 904d54de9b works fine.
2020-02-04 11:22:53 +01:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Craig Topper cd14b4a62b [X86] Remove unneeded code that looks for (and (i8 (X86setcc_c))
I don't believe we use this construct anymore so I don't think
we need to look for it.
2020-02-03 23:18:11 -08:00
Craig Topper 4581d97416 [X86] Remove some uncovered and possibly broken code from combineZext.
This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1)
but the code never checks what type we're truncating too. An and
mask of 1 would only make sense if the trunc was to MVT::i1, but
we didn't check for that.

I believe this code is a leftover from when i1 was a legal type.
2020-02-03 22:59:39 -08:00
Craig Topper 8413116bf1 [X86] Use X86ISD::SUB instead of X86ISD::CMP in some places.
Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable
CSE unless the RHS is 0. optimizeCompareInstr called by the peephole
pass can turn subs with unused results into cmps to clean this up.

This commit makes other places that create X86ISD::CMP have the
same behavior.
2020-02-03 21:01:11 -08:00
Craig Topper c3a47221e0 [X86] Don't emit two X86ISD::COMI/UCOMI nodes when handling comi/ucomi intrinsics.
We were creating two with different operand orders, and then only
using one of them.

Instead just swap the operands when needed and create a single node.
2020-02-03 20:08:01 -08:00
Simon Pilgrim 3ece5a23bd [X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI. 2020-02-03 18:06:47 +00:00
Simon Pilgrim 8c0e715eb2 [X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0 2020-02-03 16:50:03 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Simon Pilgrim 8ead5df0b1 [X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153)
Add a KnownBits::extractBits helper
2020-02-03 15:43:59 +00:00
Simon Pilgrim a9ee3ffbc0 [X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode
Some prep work for PR39153.
2020-02-03 15:16:40 +00:00
Craig Topper cf20fde1d1 [X86] Remove a couple unnecessary calls to ConvertCmpIfNecessary.
We only need to call this on floating point comparisons. In this
case these are known to be integer compares. One of them even
has a SUB opcode instead of CMP.
2020-02-02 21:36:51 -08:00
Craig Topper ee85415dbb [X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper. 2020-02-02 13:24:37 -08:00
Simon Pilgrim 5d86ac82a6 Fix a few spelling mistakes in comments. NFCI. 2020-02-02 18:27:43 +00:00
Simon Pilgrim 17e91b7dd2 [X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling 2020-02-02 18:00:09 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Craig Topper 90c31b0f42 [X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.
ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.

Fixes PR42195.

Differential Revision: https://reviews.llvm.org/D73607
2020-01-29 09:10:02 -08:00
Craig Topper e5edd641fd [X86] Use a shorter sequence to implement FLT_ROUNDS
This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping.

This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code.

Differential Revision: https://reviews.llvm.org/D73599
2020-01-29 08:56:33 -08:00
Craig Topper ca2abea29a [X86] Use SelectionDAG::getZExtOrTrunc to simplify some code. NFCI 2020-01-28 16:27:59 -08:00
Wang, Pengfei 3d1f0ce3b9 [X86] Add combination for fma and fneg on X86 under strict FP.
Summary: X86 has instructions to calculate fma and fneg at the same time. But we combine the fneg and fma only when fneg is the source operand under strict FP.

Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3

Subscribers: LuoYuanke, llvm-commits, cfe-commits, jdoerfert, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72824
2020-01-28 20:09:56 +08:00
Simon Pilgrim 2d5e281b0f [X86][AVX] Add a more aggressive SimplifyMultipleUseDemandedBits to simplify masked store masks.
Fixes a poor codegen issue noticed in PR11210.
2020-01-27 16:44:25 +00:00
Simon Pilgrim fa19d67a2a [X86][AVX] Extend combineCommutableSHUFP to handle v8f32 and v16f32 commutable shufps patterns 2020-01-26 19:04:12 +00:00
Simon Pilgrim 1a81b296cd [X86][SSE] combineCommutableSHUFP - permilps(shufps(load(),x)) --> permilps(shufps(x,load()))
Pull out combineTargetShuffle code added in rG3fd5d1c6e7db into a helper function and extend it to handle shufps(shufps(load(),x),y) and shufps(y,shufps(load(),x)) cases as well.
2020-01-26 14:36:23 +00:00
Craig Topper 3fdd435a4b [X86] Use a macro to convert X86ISD names to strings in getTargetNodeName.
Every case in the switch had a string version of themselves. Two
of them had a typo that used : instead of ::

By using a macro we can automate the string creation and avoid
the possibility of typos like this.

This is similar to what is done on the AMDGPU target.
2020-01-25 18:27:29 -08:00
Craig Topper 2c1decc040 [X86] Break the loop in LowerReturn into 2 loops. NFCI
I believe for STRICT_FP I need to use a STRICT_FP_EXTEND for the extending to f80 for returning f32/f64 in 32-bit mode when SSE is enabled. The STRICT_FP_EXTEND node requires a Chain. I need to get that node onto the chain before any CopyToRegs are emitted. This is because all the CopyToRegs are glued and chained together. So I can't put a STRICT_FP_EXTEND on the chain between the glued nodes without also glueing the STRICT_ FP_EXTEND.

This patch moves all the extend creation to a first pass and then creates the copytoregs and fills out RetOps in a second pass.

Differential Revision: https://reviews.llvm.org/D72665
2020-01-24 14:44:38 -08:00
Simon Pilgrim 3fd5d1c6e7 [X86][SSE] combineTargetShuffle - permilps(shufps(load(),x)) --> permilps(shufps(x,load()))
Moves lowerShuffleWithSHUFPS commutation code from rG30fcd29fe479 to catch cases during combine
2020-01-24 15:23:20 +00:00
Simon Pilgrim 30fcd29fe4 [X86][SSE] lowerShuffleWithSHUFPS - commute '2*V1+2*V2 elements' mask if it allows a loaded fold
As mentioned on D73023.
2020-01-24 12:04:10 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Simon Pilgrim 0ec25a0316 [X86] LowerRotate - early out for vector rotates by zero 2020-01-23 17:48:09 +00:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Sanjay Patel 363d27c871 [x86] fold vperm2x128 to concat of 128-bit high half vectors
vperm (ins ?, X, C), (ins ?, Y, C), 0x31 --> concat X, Y

This is another shuffle problem seen with PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

We have this small crack in legalization/lowering/combining/demanded
that allows forming a vperm2f128 of high halves with AVX1 when we
could do better by peeking through the insert_subvector nodes.
AFAICT, it requires IR as shown in the diffs - much larger than legal
vectors - to avoid all of the usual folds.

Another option would prevent forming the 256-bit vperm in lowering.

Differential Revision: https://reviews.llvm.org/D73197
2020-01-22 15:35:50 -05:00
Simon Pilgrim 5340434c94 [X86][SSE] combineExtractWithShuffle - extract(bitcast(broadcast(x))) --> x
Removes some unnecessary gpr<-->fpu traffic
2020-01-22 18:02:58 +00:00
Simon Pilgrim a14aa7dabd [X86][SSE] combineExtractWithShuffle - extract(bictcast(scalar_to_vector(x))) --> x
Removes some unnecessary gpr<-->fpu traffic
2020-01-22 16:11:08 +00:00
Simon Pilgrim c784e5451b Use SelectionDAG::getShiftAmountConstant(). NFCI. 2020-01-22 13:52:43 +00:00
Simon Pilgrim 963f268186 [X86][SSE] combineExtractWithShuffle - pull out repeated extract index code. NFCI. 2020-01-22 12:08:58 +00:00
Simon Pilgrim b065902ed4 [X86] combineBT - use SimplifyDemandedBits instead of GetDemandedBits
Another step towards removing SelectionDAG::GetDemandedBits entirely
2020-01-21 14:24:46 +00:00
Simon Pilgrim eaa4548459 [X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling.
Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.
2020-01-20 10:48:54 +00:00
Florian Hahn 0ee1db2d1d [X86] Try to avoid casts around logical vector ops recursively.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.

This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D72524
2020-01-19 17:22:43 -08:00
Craig Topper 5fa2022ec0 [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72805
2020-01-18 23:44:05 -06:00
Simon Pilgrim 69bc450882 [X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN
Since it can only ever create VALIGN nodes.
2020-01-18 11:29:14 +00:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
Sanjay Patel 43f60e614a [x86] try harder to form 256-bit unpck*
This is another part of a problem noted in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split
into the expected 128-bit unpack instructions. We have to be selective in
matching the types where we try to do this though. Otherwise, we can end up
with more instructions (in the case of v8x32/v4x64).

Differential Revision: https://reviews.llvm.org/D72575
2020-01-17 10:42:39 -05:00
Craig Topper e445447921 [X86] When handling i64->f32 sint_to_fp on 32-bit targets only bitcast to f64 if sse2 is enabled.
The code is trying to copy the i64 value to an xmm register to
use a 64-bit store so that the 64-bit fild can benefit from
store forwarding.

But this trick only works if f64 is going to be stored in an
XMM register. If we only have SSE1 then only float is in xmm
register. So this trick just causes 2 stores i32 stores, an f64
load into the x87, an f64 from x87, and a 64-bit fild. So we end
up with an extra stack temporary and still didn't get store forwarding.

We might be able to use v2f32 here instead, but I didn't check. I
just wanted the code to make sense.

Found by inspection as I continue to stare too hard at our
int_to_fp conversions.
2020-01-15 18:26:28 -08:00
Craig Topper be8f217b18 [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.
We were performing an emulated i32->f64 in the SSE registers, then
storing that value to memory and doing a extload into the X87
domain.

After this patch we'll now just store the i32 to memory along
with an i32 0. Then do a 64-bit FILD to f80 completely in the X87
unit. This matches what we do without SSE.
2020-01-15 00:43:07 -08:00
Reid Kleckner 40cd26c700 [Win64] Handle FP arguments more gracefully under -mno-sse
Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.

I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.

Reviewers: jyknight, aemerson

Differential Revision: https://reviews.llvm.org/D70465
2020-01-14 17:19:35 -08:00
Craig Topper 76291e1158 [X86] Drop an unneeded FIXME. NFC
The extload on X87 is free.
2020-01-14 17:05:46 -08:00
Craig Topper 57eb56b839 [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit mode i64->f32/f64/f80 uint_to_fp algorithm.
This allows us to generate better code for selecting the fixup
to load.

Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.
2020-01-14 17:05:23 -08:00
Craig Topper 98c54fb1fe [X86] Directly emit a BROADCAST_LOAD from constant pool in lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971
By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before.

Differential Revision: https://reviews.llvm.org/D72307
2020-01-14 10:50:39 -08:00
Simon Pilgrim 66e39067ed [X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.
Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better).

This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
2020-01-12 12:29:41 +00:00
Simon Pilgrim b375f28b0e [X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded elements of the lane mask.
Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
2020-01-12 09:41:40 +00:00