Commit Graph

7067 Commits

Author SHA1 Message Date
Craig Topper 3543ac9ab5 [X86] Rewrite LowerBRCOND to remove dead code and handle ISD::SETCC and overflow ops directly.
There's a lot of old leftover code in LowerBRCOND. Especially
the detecting or AND or OR of X86ISD::SETCC nodes. Those were
needed before LegalizeDAG was changed to visit nodes before
their operands.

It also relied on reversing the output of LowerSETCC to find the
flags producing node to use for the X86ISD::BRCOND node.

Rather than using LowerSETCC this patch uses emitFlagsForSetcc to
handle the integer ISD::SETCC case. This gives the flag producer
and the comparison code to use directly. I've removed the addTest
flag and just produce a X86ISD::BRCOND and return immediately.

Floating point ISD::SETCC case is just an X86ISD::FCMP with special
care for OEQ and UNE derived from the previous code. I've left
f128 out so it will emit a test. And LowerSETCC will be called
later to produce a libcall and X86ISD::SETCC. We have combines
that can merge the test and X86ISD::SETCC.

We need to handle two cases for overflow ops. Either they are used
directly or they have a seteq 0 or setne 1 to invert the overflow.
The old code did not handle the setne 1 case, but I think some
other combines were making up for it.

If we fail to find a condition, we'll wrap an AND with 1 on the
original condition and tell emitFlagsForSetcc to emit a compare
with 0. This will pickup the LowerAndToBT and or the EmitTest case.
I kept the isTruncWithZeroHighBitsInput call, but we might be able
to fold that in to emitFlagsForSetcc.

Differential Revision: https://reviews.llvm.org/D74750
2020-02-20 08:50:18 -08:00
Craig Topper 12cc105f80 [X86] Add DAG combines to form CVTPH2PS/CVTPS2PH from vXf16->vXf32/vXf64 fp_extends and vXf32->vXf16 fp_round.
Only handle power of 2 element count for simplicity. Not sure what to do with vXf64->vXf16 fp_round to avoid double rounding

Differential Revision: https://reviews.llvm.org/D74886
2020-02-20 08:26:17 -08:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Craig Topper f559cecc3e [X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore.
We only need to split after type legalization. If we're before
we can just use a wide store and type legalization will split it.

Add a v128i1 test to exercise it post type legalization.
2020-02-19 09:18:16 -08:00
Florian Hahn 216afd3301 [TargetLower] Update shouldFormOverflowOp check if math is used.
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.

This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74722
2020-02-19 11:28:33 +01:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Craig Topper f69a29da5a [X86] Remove vXi1 select optimization from LowerSELECT. Move it to DAG combine. 2020-02-19 00:00:55 -08:00
Craig Topper 0dbc4658d8 [X86] Handle splats in LowerBUILD_VECTORvXi1 by directly emitting scalar selects instead of deferring that to LowerSELECT.
LoweSELECT will detect the constant inputs and convert to scalar
selects, but we can do it directly here.

I might remove some of the code from LowerSELECT and move it to
DAG combine so doing this explicitly will make us less dependent
on it happening in lowering.
2020-02-18 22:39:30 -08:00
Simon Pilgrim d6eef0614f [TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC. 2020-02-18 19:53:50 +00:00
Craig Topper 89ab5c69c8 [X86] Add a helper function to pull some repeated code out of combineGatherScatter. NFC 2020-02-18 11:10:40 -08:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
Craig Topper e90dc7c48b [X86] Move avx512 code that forces zeros to the false side of vselects above a check for legal types.
This helps this transform occur earlier so we can fold the not
with setcc. If we delay it until after type legalization we might
have introduced instructions to widen the mask if the vselect was
widened. This can prevent the not from making it to the setcc.

We could of course add more DAG combines to handle that, but
moving this earlier is easier.
2020-02-17 22:24:21 -08:00
Craig Topper b0840934a7 [X86] Use isScalarFPTypeInSSEReg to simplify code in LowerSELECT. NFC 2020-02-17 19:43:57 -08:00
Craig Topper 3f4490d384 [X86] Add one use check to '0-x == y --> x+y == 0' in EmitCmp.
I failed to copy it when I moved this in
b62de210cf
2020-02-17 18:16:42 -08:00
Craig Topper 43e948c4b7 [X86] Change how the alignment for the stack object is created in LowerFLT_ROUNDS_.
We don't need FrameInfo's concept of the stack alignment. We just
need to tell it the desired alignment. Which in this case is 2.
2020-02-17 11:27:34 -08:00
Craig Topper b62de210cf [X86] Move '0-x == y --> x+y == 0' and similar combines to EmitCmp.
AArch64 handles this pattern in their lowering code. By emitting
CMN. ARM handles it as an isel pattern.
2020-02-17 11:27:34 -08:00
Nikita Popov 80397d2d12 [IRBuilder] Delete copy constructor
D73835 will make IRBuilder no longer trivially copyable. This patch
deletes the copy constructor in advance, to separate out the breakage.

Currently, the IRBuilder copy constructor is usually used by accident,
not by intention.  In rG7c362b25d7a9 I've fixed a number of cases where
functions accepted IRBuilder rather than IRBuilder &, thus performing
an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an
IRBuilder was copied, while an InsertPointGuard should have been used
instead.

The only non-trivial use of the copy constructor is the
getIRBForDbgInsertion() helper, for which I separated construction and
setting of the insertion point in this patch.

Differential Revision: https://reviews.llvm.org/D74693
2020-02-17 18:14:48 +01:00
Craig Topper 464729cf7c [X86] Remove unnecessary check for null SDValue. NFC 2020-02-16 20:25:24 -08:00
Craig Topper 272d35aef5 [X86] Separate floating point handling out of EmitCmp and emitFlagsForSetcc.
Both of those functions only have a single caller starting
at LowerSETCC. Just handle floating point directly in LowerSETCC.

This removes the need to pass Chain and IsSignaling all the way
down.
2020-02-16 10:51:05 -08:00
Craig Topper d26f11108b [X86] Split X86ISD::CMP into an integer and FP opcode. 2020-02-16 10:10:19 -08:00
Simon Pilgrim b85df2e185 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR 2020-02-16 16:13:26 +00:00
Simon Pilgrim c9c1c2b335 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts 2020-02-16 16:13:25 +00:00
Sanjay Patel e48b536be6 [x86] form broadcast of scalar memop even with >1 use
The unseen logic diff occurs because MayFoldLoad() is defined like this:

static bool MayFoldLoad(SDValue Op) {
  return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode());
}

The test diffs here all seem ok to me on screen/paper, but it's hard to know
if that will lead to universally better perf for all targets. For example,
if a target implements broadcast from mem as multiple uops, we would have to
weigh the potential reduction of instructions and register pressure vs.
possible increase in number of uops. I don't know if we can make a truly
informed decision on this at compile-time.

The motivating case that I'm looking at in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
...resembles the diff in extract-concat.ll, but we're not going to change the
larger example there without at least 1 other fix.

Differential Revision: https://reviews.llvm.org/D74088
2020-02-16 10:32:56 -05:00
Simon Pilgrim 34a054ce71 [X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI
Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.
2020-02-15 20:04:54 +00:00
Simon Pilgrim 2492075add [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.

REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.
2020-02-14 11:55:18 +00:00
Craig Topper c2e8a421ac [X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL.
If we widen the compare we might trigger a spurious exception from
the garbage data.

We have two choices here. Explicitly force the upper bits to zero.
Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM
result to mask register.

I've chosen to go with the second option. I'm not sure which is
really best. In some cases we could get rid of the zeroing since
the producing instruction probably already zeroed it. But we lose
the ability to fold a load. So which is best is dependent on
surrounding code.

Differential Revision: https://reviews.llvm.org/D74522
2020-02-13 13:26:40 -08:00
Amy Huang de1d90299b Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets"
This reverts commit 11c16e7159 because it
causes a crash in chromium code. See
https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.
2020-02-12 17:00:37 -08:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Simon Pilgrim ff307c8120 [X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression
This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN.

Differential Revision: https://reviews.llvm.org/D74231
2020-02-12 16:07:27 +00:00
Simon Pilgrim 9eb426c88c [TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.

This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.

It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.

Differential Revision: https://reviews.llvm.org/D74221
2020-02-12 11:51:42 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Craig Topper 0daf9b8e41 [X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.

Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
2020-02-11 22:30:04 -08:00
Craig Topper 846d0ac43e [X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512
We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen.

Differential Revision: https://reviews.llvm.org/D74431
2020-02-11 14:36:29 -08:00
Simon Pilgrim fa620fc8e2 [X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC. 2020-02-11 13:37:57 +00:00
Simon Pilgrim 11c16e7159 [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.
2020-02-11 12:21:03 +00:00
Craig Topper 798305d29b [X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns.
We need to use vector instructions for these operations. Previously
we handled this with isel patterns that used extra instructions
and copies to handle the the conversions.

Now we use custom lowering to emit the conversions. This allows
them to be pattern matched and optimized on their own. For
example we can now emit vpextrw to store the result if its going
directly to memory.

I've forced the upper elements to VCVTPHS2PS to zero to keep some
code similar. Zeroes will be needed for strictfp. I've added a
DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra
instructions in between to be closer to the previous codegen.

This is a step towards strictfp support for f16 conversions.
2020-02-10 22:01:48 -08:00
Simon Pilgrim f319074824 [X86] combineConcatVectorOps - combine X86ISD::PACKSS ops 2020-02-10 17:48:02 +00:00
Simon Pilgrim 74c0f98cf5 [X86] combineConcatVectorOps - combine X86ISD::VPERMI ops 2020-02-10 17:48:01 +00:00
Simon Pilgrim 2463b8c97d [X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops
Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-10 16:59:09 +00:00
Simon Pilgrim 06617c4522 [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.
2020-02-10 16:16:56 +00:00
Simon Pilgrim 39eade73a5 Revert rGe82e17d4d4cac8b2df00094e80d5e1cb22795664 - [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
---
Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.
2020-02-10 12:14:26 +00:00
Craig Topper 06ba969c9d [X86] Make (insert_vector_elt (v8i16 zerovec), i16 %x, 0) generate the same code as (v8i16 (build_vector %x, 0, 0, 0, 0, 0, 0, 0)).
Instead of using a insrw to element 0, use movzx and movd.

Same for v16i8.
2020-02-09 21:52:11 -08:00
Simon Pilgrim 29e646fe65 [X86] combineConcatVectorOps - combine VROTLI/VROTRI ops
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:50:10 +00:00
Simon Pilgrim e82e17d4d4 [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:15:03 +00:00
Simon Pilgrim 29621b2534 [X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI.
A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.
2020-02-09 18:35:50 +00:00
Simon Pilgrim 3ec6de07e9 Fix signed/unsigned warning. 2020-02-09 13:35:03 +00:00
Simon Pilgrim 644d56b432 [X86] Recognise ROTLI/ROTRI rotations as faux shuffles
Allows us to combine rotations with shuffles.

One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-09 12:25:49 +00:00
serge_sans_paille e67cbac812 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 10:42:45 +01:00
serge-sans-paille 4546211600 Revert "Support -fstack-clash-protection for x86"
This reverts commit 0fd51a4554.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354
2020-02-09 10:06:31 +01:00
serge_sans_paille 0fd51a4554 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 09:35:42 +01:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
serge-sans-paille 658495e6ec Revert "Support -fstack-clash-protection for x86"
This reverts commit e229017732.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308
2020-02-08 14:26:22 +01:00
serge_sans_paille e229017732 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with better option
handling and more portable testing

Differential Revision: https://reviews.llvm.org/D68720
2020-02-08 13:31:52 +01:00
Simon Pilgrim 7f5b3fa73c [X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree
Peek through X86ISD::FRCP nodes to see if there is a negatible input.
2020-02-08 10:56:27 +00:00
Simon Pilgrim 4229f12a22 [TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC.
This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.
2020-02-08 08:55:51 +00:00
Nico Weber b03c3d8c62 Revert "Support -fstack-clash-protection for x86"
This reverts commit 4a1a0690ad.
Breaks tests on mac and win, see https://reviews.llvm.org/D68720
2020-02-07 14:49:38 -05:00
serge_sans_paille 4a1a0690ad Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with correct option
flags set.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-07 19:54:39 +01:00
Sanjay Patel de6f7eb47e [x86] don't create an unused constant vector
Noticed while scanning through debug spew. Creating unused
nodes is inefficient and makes following the debug output harder.
2020-02-07 12:05:02 -05:00
Simon Pilgrim c96001035d [X86] isNegatibleForFree - allow pre-legalized FMA negation
As long as the FMA operation is legal (which we can proxy for the FMA3/FMA4 variants as well), we don't have to wait for the LegalOperations stage.
2020-02-07 17:04:17 +00:00
serge-sans-paille f6d98429fc Revert "Support -fstack-clash-protection for x86"
This reverts commit 39f50da2a3.

The -fstack-clash-protection is being passed to the linker too, which
is not intended.

Reverting and fixing that in a later commit.
2020-02-07 11:36:53 +01:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
serge_sans_paille 39f50da2a3 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

Differential Revision: https://reviews.llvm.org/D68720
2020-02-07 10:56:15 +01:00
Craig Topper 3f62028f2f [X86] Use SelectionDAG::getAllOnesConstant to simplify some code. NFC 2020-02-06 21:32:53 -08:00
Craig Topper ec9a94af4d [X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2
X86 uses i8 for shift amounts. This code can fail on a 32-bit target
if it runs after type legalization.

This code was copied from AArch64 and modified for X86, but the
shift amount wasn't changed to the correct type for X86.

Fixes PR44812
2020-02-06 13:32:13 -08:00
Craig Topper 4175d7e22e [X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary
If we don't have cmov, X87 compares write to FPSW and we need to
move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions.

Previously this was done by calling ConvertCmpIfNecessary in
multiple places which would emit the extra code for the FNSTSW,
a shift, a truncate, and a SAHF instructions. Isel would then
select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW.

This patch centralizes all of the handling into a single custom
isel handler. This allows us to remove ConvertCmpIfNecessary and
a couple target specific ISD opcodes.

Differential Revision: https://reviews.llvm.org/D73863
2020-02-06 10:43:06 -08:00
Sanjay Patel 0a389c81cd [x86] use getSplatIndex() in lowerShuffleAsBroadcast()
The old code was doing an N^2 search for splat index.

Differential Revision: https://reviews.llvm.org/D74064
2020-02-05 14:55:02 -05:00
Craig Topper a3d489e87e [X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns.
Same for any_extend though we don't have coverage for that.

The test changes are because isel didn't check one use of the
setcc_carry. So in isel we would end up with two different
sized setcc_carry instructions. And since it clobbers
the flags we would need to recreate the flags for the second
instruction.

This code handles additional uses by truncating the new wide
setcc_carry back to the original size for those uses.
2020-02-04 22:40:36 -08:00
Craig Topper 016f42e3dc [X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist.
lrint/llrint are defined as rounding using the current rounding
mode. Numbers that can't be converted raise FE_INVALID and an
implementation defined value is returned. They may also write to
errno.

I believe this means we can use cvtss2si/cvtsd2si or fist to
convert as long as -fno-math-errno is passed on the command line.
Clang will leave them as libcalls if errno is enabled so they
won't become ISD::LRINT/LLRINT in SelectionDAG.

For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si
but we can use fist since it can write to a 64-bit memory location.
Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq
targets?

gcc also does this optimization.

I think we might be able to do this with STRICT_LRINT/LLRINT as
well, but I've left that for future work.

Differential Revision: https://reviews.llvm.org/D73859
2020-02-04 16:15:40 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Craig Topper e195ff98f6 Recommit "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places."
This time with correct types for the data result from the SUB.

Original commit message:

Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable
CSE unless the RHS is 0. optimizeCompareInstr called by the peephole
pass can turn subs with unused results into cmps to clean this up.

This commit makes other places that create X86ISD::CMP have the
same behavior.
2020-02-04 12:19:34 -08:00
Kadir Cetinkaya d2b6ac6ccd
Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places."
This reverts commit 8413116bf1.

this seems to be causing crashes while compiling ncurses.
```
$ ./bin/llc bugpoint-reduced-simplified.ll
LLVM ERROR: Cannot emit physreg copy instruction
```

Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e

executing with an llc compiled at 904d54de9b works fine.
2020-02-04 11:22:53 +01:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Craig Topper cd14b4a62b [X86] Remove unneeded code that looks for (and (i8 (X86setcc_c))
I don't believe we use this construct anymore so I don't think
we need to look for it.
2020-02-03 23:18:11 -08:00
Craig Topper 4581d97416 [X86] Remove some uncovered and possibly broken code from combineZext.
This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1)
but the code never checks what type we're truncating too. An and
mask of 1 would only make sense if the trunc was to MVT::i1, but
we didn't check for that.

I believe this code is a leftover from when i1 was a legal type.
2020-02-03 22:59:39 -08:00
Craig Topper 8413116bf1 [X86] Use X86ISD::SUB instead of X86ISD::CMP in some places.
Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable
CSE unless the RHS is 0. optimizeCompareInstr called by the peephole
pass can turn subs with unused results into cmps to clean this up.

This commit makes other places that create X86ISD::CMP have the
same behavior.
2020-02-03 21:01:11 -08:00
Craig Topper c3a47221e0 [X86] Don't emit two X86ISD::COMI/UCOMI nodes when handling comi/ucomi intrinsics.
We were creating two with different operand orders, and then only
using one of them.

Instead just swap the operands when needed and create a single node.
2020-02-03 20:08:01 -08:00
Simon Pilgrim 3ece5a23bd [X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI. 2020-02-03 18:06:47 +00:00
Simon Pilgrim 8c0e715eb2 [X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0 2020-02-03 16:50:03 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Simon Pilgrim 8ead5df0b1 [X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153)
Add a KnownBits::extractBits helper
2020-02-03 15:43:59 +00:00
Simon Pilgrim a9ee3ffbc0 [X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode
Some prep work for PR39153.
2020-02-03 15:16:40 +00:00
Craig Topper cf20fde1d1 [X86] Remove a couple unnecessary calls to ConvertCmpIfNecessary.
We only need to call this on floating point comparisons. In this
case these are known to be integer compares. One of them even
has a SUB opcode instead of CMP.
2020-02-02 21:36:51 -08:00
Craig Topper ee85415dbb [X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper. 2020-02-02 13:24:37 -08:00
Simon Pilgrim 5d86ac82a6 Fix a few spelling mistakes in comments. NFCI. 2020-02-02 18:27:43 +00:00
Simon Pilgrim 17e91b7dd2 [X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling 2020-02-02 18:00:09 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Craig Topper 90c31b0f42 [X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.
ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.

Fixes PR42195.

Differential Revision: https://reviews.llvm.org/D73607
2020-01-29 09:10:02 -08:00
Craig Topper e5edd641fd [X86] Use a shorter sequence to implement FLT_ROUNDS
This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping.

This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code.

Differential Revision: https://reviews.llvm.org/D73599
2020-01-29 08:56:33 -08:00
Craig Topper ca2abea29a [X86] Use SelectionDAG::getZExtOrTrunc to simplify some code. NFCI 2020-01-28 16:27:59 -08:00
Wang, Pengfei 3d1f0ce3b9 [X86] Add combination for fma and fneg on X86 under strict FP.
Summary: X86 has instructions to calculate fma and fneg at the same time. But we combine the fneg and fma only when fneg is the source operand under strict FP.

Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3

Subscribers: LuoYuanke, llvm-commits, cfe-commits, jdoerfert, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72824
2020-01-28 20:09:56 +08:00
Simon Pilgrim 2d5e281b0f [X86][AVX] Add a more aggressive SimplifyMultipleUseDemandedBits to simplify masked store masks.
Fixes a poor codegen issue noticed in PR11210.
2020-01-27 16:44:25 +00:00
Simon Pilgrim fa19d67a2a [X86][AVX] Extend combineCommutableSHUFP to handle v8f32 and v16f32 commutable shufps patterns 2020-01-26 19:04:12 +00:00
Simon Pilgrim 1a81b296cd [X86][SSE] combineCommutableSHUFP - permilps(shufps(load(),x)) --> permilps(shufps(x,load()))
Pull out combineTargetShuffle code added in rG3fd5d1c6e7db into a helper function and extend it to handle shufps(shufps(load(),x),y) and shufps(y,shufps(load(),x)) cases as well.
2020-01-26 14:36:23 +00:00
Craig Topper 3fdd435a4b [X86] Use a macro to convert X86ISD names to strings in getTargetNodeName.
Every case in the switch had a string version of themselves. Two
of them had a typo that used : instead of ::

By using a macro we can automate the string creation and avoid
the possibility of typos like this.

This is similar to what is done on the AMDGPU target.
2020-01-25 18:27:29 -08:00
Craig Topper 2c1decc040 [X86] Break the loop in LowerReturn into 2 loops. NFCI
I believe for STRICT_FP I need to use a STRICT_FP_EXTEND for the extending to f80 for returning f32/f64 in 32-bit mode when SSE is enabled. The STRICT_FP_EXTEND node requires a Chain. I need to get that node onto the chain before any CopyToRegs are emitted. This is because all the CopyToRegs are glued and chained together. So I can't put a STRICT_FP_EXTEND on the chain between the glued nodes without also glueing the STRICT_ FP_EXTEND.

This patch moves all the extend creation to a first pass and then creates the copytoregs and fills out RetOps in a second pass.

Differential Revision: https://reviews.llvm.org/D72665
2020-01-24 14:44:38 -08:00
Simon Pilgrim 3fd5d1c6e7 [X86][SSE] combineTargetShuffle - permilps(shufps(load(),x)) --> permilps(shufps(x,load()))
Moves lowerShuffleWithSHUFPS commutation code from rG30fcd29fe479 to catch cases during combine
2020-01-24 15:23:20 +00:00
Simon Pilgrim 30fcd29fe4 [X86][SSE] lowerShuffleWithSHUFPS - commute '2*V1+2*V2 elements' mask if it allows a loaded fold
As mentioned on D73023.
2020-01-24 12:04:10 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Simon Pilgrim 0ec25a0316 [X86] LowerRotate - early out for vector rotates by zero 2020-01-23 17:48:09 +00:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Sanjay Patel 363d27c871 [x86] fold vperm2x128 to concat of 128-bit high half vectors
vperm (ins ?, X, C), (ins ?, Y, C), 0x31 --> concat X, Y

This is another shuffle problem seen with PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

We have this small crack in legalization/lowering/combining/demanded
that allows forming a vperm2f128 of high halves with AVX1 when we
could do better by peeking through the insert_subvector nodes.
AFAICT, it requires IR as shown in the diffs - much larger than legal
vectors - to avoid all of the usual folds.

Another option would prevent forming the 256-bit vperm in lowering.

Differential Revision: https://reviews.llvm.org/D73197
2020-01-22 15:35:50 -05:00
Simon Pilgrim 5340434c94 [X86][SSE] combineExtractWithShuffle - extract(bitcast(broadcast(x))) --> x
Removes some unnecessary gpr<-->fpu traffic
2020-01-22 18:02:58 +00:00
Simon Pilgrim a14aa7dabd [X86][SSE] combineExtractWithShuffle - extract(bictcast(scalar_to_vector(x))) --> x
Removes some unnecessary gpr<-->fpu traffic
2020-01-22 16:11:08 +00:00
Simon Pilgrim c784e5451b Use SelectionDAG::getShiftAmountConstant(). NFCI. 2020-01-22 13:52:43 +00:00
Simon Pilgrim 963f268186 [X86][SSE] combineExtractWithShuffle - pull out repeated extract index code. NFCI. 2020-01-22 12:08:58 +00:00
Simon Pilgrim b065902ed4 [X86] combineBT - use SimplifyDemandedBits instead of GetDemandedBits
Another step towards removing SelectionDAG::GetDemandedBits entirely
2020-01-21 14:24:46 +00:00
Simon Pilgrim eaa4548459 [X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling.
Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.
2020-01-20 10:48:54 +00:00
Florian Hahn 0ee1db2d1d [X86] Try to avoid casts around logical vector ops recursively.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.

This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D72524
2020-01-19 17:22:43 -08:00
Craig Topper 5fa2022ec0 [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72805
2020-01-18 23:44:05 -06:00
Simon Pilgrim 69bc450882 [X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN
Since it can only ever create VALIGN nodes.
2020-01-18 11:29:14 +00:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
Sanjay Patel 43f60e614a [x86] try harder to form 256-bit unpck*
This is another part of a problem noted in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split
into the expected 128-bit unpack instructions. We have to be selective in
matching the types where we try to do this though. Otherwise, we can end up
with more instructions (in the case of v8x32/v4x64).

Differential Revision: https://reviews.llvm.org/D72575
2020-01-17 10:42:39 -05:00
Craig Topper e445447921 [X86] When handling i64->f32 sint_to_fp on 32-bit targets only bitcast to f64 if sse2 is enabled.
The code is trying to copy the i64 value to an xmm register to
use a 64-bit store so that the 64-bit fild can benefit from
store forwarding.

But this trick only works if f64 is going to be stored in an
XMM register. If we only have SSE1 then only float is in xmm
register. So this trick just causes 2 stores i32 stores, an f64
load into the x87, an f64 from x87, and a 64-bit fild. So we end
up with an extra stack temporary and still didn't get store forwarding.

We might be able to use v2f32 here instead, but I didn't check. I
just wanted the code to make sense.

Found by inspection as I continue to stare too hard at our
int_to_fp conversions.
2020-01-15 18:26:28 -08:00
Craig Topper be8f217b18 [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.
We were performing an emulated i32->f64 in the SSE registers, then
storing that value to memory and doing a extload into the X87
domain.

After this patch we'll now just store the i32 to memory along
with an i32 0. Then do a 64-bit FILD to f80 completely in the X87
unit. This matches what we do without SSE.
2020-01-15 00:43:07 -08:00
Reid Kleckner 40cd26c700 [Win64] Handle FP arguments more gracefully under -mno-sse
Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.

I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.

Reviewers: jyknight, aemerson

Differential Revision: https://reviews.llvm.org/D70465
2020-01-14 17:19:35 -08:00
Craig Topper 76291e1158 [X86] Drop an unneeded FIXME. NFC
The extload on X87 is free.
2020-01-14 17:05:46 -08:00
Craig Topper 57eb56b839 [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit mode i64->f32/f64/f80 uint_to_fp algorithm.
This allows us to generate better code for selecting the fixup
to load.

Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.
2020-01-14 17:05:23 -08:00
Craig Topper 98c54fb1fe [X86] Directly emit a BROADCAST_LOAD from constant pool in lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971
By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before.

Differential Revision: https://reviews.llvm.org/D72307
2020-01-14 10:50:39 -08:00
Simon Pilgrim 66e39067ed [X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.
Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better).

This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
2020-01-12 12:29:41 +00:00
Simon Pilgrim b375f28b0e [X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded elements of the lane mask.
Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
2020-01-12 09:41:40 +00:00
Craig Topper d692f0f6c8 [X86] Don't call LowerSETCC from LowerSELECT for STRICT_FSETCC/STRICT_FSETCCS nodes.
This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered
early while lowering SELECT, but the output chain doesn't get
connected. Then we visit the node again when it is its turn
because we haven't replaced the use of the chain result. In the
case of the fp128 libcall lowering, after D72341 this will cause
the libcall to be emitted twice.
2020-01-11 20:43:00 -08:00
Craig Topper ed679804d5 [TargetLowering][X86] Connect the chain from STRICT_FSETCC in TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper. 2020-01-11 17:50:20 -08:00
Simon Pilgrim 24763734e7 [X86] Fix outdated comment
The generic saturated math opcodes are no longer widened inside X86TargetLowering
2020-01-11 14:37:18 +00:00
Simon Pilgrim ce35010d78 [X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP lowering
Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed.

This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case.

Helps with poor shuffles mentioned in D66004.
2020-01-11 12:42:00 +00:00
Simon Pilgrim a5bdada09d [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize multi-input shuffle elements
We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths.

Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
2020-01-10 17:21:20 +00:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Craig Topper 3811417f39 [X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in 64-bit mode
For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier.

When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd.

Differential Revision: https://reviews.llvm.org/D72368
2020-01-08 10:06:01 -08:00
Wang, Pengfei 9a621de1ec [X86] Adding fp128 support for strict fcmp
Summary: Adding fp128 support for strict fcmp

Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71897
2020-01-08 12:59:31 +08:00
Craig Topper 9685cf709f [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 target
Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now.

Differential Revision: https://reviews.llvm.org/D72354
2020-01-07 13:25:29 -08:00
Craig Topper afa8211e97 [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable v2i64 in foldVectorXorShiftIntoCmp.
Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence.

Differential Revision: https://reviews.llvm.org/D72318
2020-01-07 11:22:04 -08:00
Craig Topper b9376690a0 [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targets
Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements.

Differential Revision: https://reviews.llvm.org/D72302
2020-01-07 11:22:03 -08:00
Simon Pilgrim 0e912e22b6 [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI. 2020-01-07 16:51:10 +00:00
Simon Pilgrim c0365aaaa4 [X86] Standardize shuffle match/lowering function names. NFC.
We mainly use lowerShuffle*/matchShuffle* - replace the (few) lowerVectorShuffle*/matchVectorShuffle* cases to be consistent.
2020-01-07 13:41:52 +00:00
Craig Topper 6a0564adcf [X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets.
Use zext+or+fsub to do the conversion. Similar to D71971.

Differential Revision: https://reviews.llvm.org/D71971
2020-01-06 14:07:35 -08:00
Craig Topper 95840866b7 [X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.
Summary:
Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling.

Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle.

I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack.

Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71956
2020-01-05 17:44:08 -08:00
Liu, Chen3 ca3bf289a7 [NFC] Modify the format:
Drop the else since we alerady returned in the if.
2020-01-06 09:35:19 +08:00
Simon Pilgrim e3bd011890 [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes (PR43660)
Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M)

We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
2020-01-05 18:50:44 +00:00
Simon Pilgrim 6a6e6f04ec [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.
Updates function order in preparation of future fix for PR43660
2020-01-05 17:17:41 +00:00
Simon Pilgrim 3db84f142a [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END (NFC)
Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
2020-01-05 15:24:57 +00:00
Craig Topper 2875cc6b29 [X86] Improve for v2i32->v2f64 uint_to_fp
This uses an alternative implementation of this conversion derived
from our v2i32->v2f32 handling. We can zero extend the v2i32 to
v2i64, or it with the bit representation of 2.0^52 which will give
us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits.
Then we just need to subtract 2.0^52 as a double and let the floating
point unit normalize the remaining bits into a valid double.

This is less instructions then our previous code, but does require
a port 5 shuffle for the zero extend or unpack.

Differential Revision: https://reviews.llvm.org/D71945
2020-01-03 11:39:08 -08:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
Craig Topper bd46e29742 [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB instead of an FADD.
Summary:
We previously disabled this under fast math due to aggressive
reassociation by the machine combiner. But I think we can work
around this by using a FSUB instead of FADD for the first
operation.

This matches the similar algorithm we do for uint_to_fp i64->f64
in TargetLowering::expandUINT_TO_FP. If reassociation hasn't
been a problem for that, hopefully its not a problem here.

Reviewers: RKSimon, spatel, scanon

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71968
2020-01-02 21:46:53 -08:00
Wang, Pengfei 60333a5317 [X86] Enable strict FP by default and remove option -disable-strictnode-mutation. NFCI. 2020-01-03 10:59:34 +08:00
Wang, Pengfei 9dc9e0ea64 [X86] Optimization of inserting vxi1 sub vector into vXi1 vector
Summary:
After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations.

The history information at https://reviews.llvm.org/D68311

Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D71917
2020-01-03 09:25:25 +08:00
Craig Topper c36763d894 [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if the condition is used by something other than select conditions.
We might be able to bypass some nodes on the condition path.

Differential Revision: https://reviews.llvm.org/D71984
2020-01-01 11:16:52 -08:00
Liu, Chen3 8af492ade1 add strict float for round operation
Differential Revision: https://reviews.llvm.org/D72026
2020-01-01 20:42:12 +08:00
Craig Topper 26bdc603f7 [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector. 2019-12-31 15:57:39 -08:00
Craig Topper 1cc8a74de3 [X86] Use carry flag from add for (seteq (add X, -1), -1).
If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest.

Fixes one case from PR44412

Differential Revision: https://reviews.llvm.org/D72019
2019-12-31 15:05:23 -08:00
Craig Topper e898ba2d15 [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in LowerCallResult to use FP1 if there are two return values.
If the return value is a struct of 2 doubles we need two return
registers.

If SSE2 is disabled we can't return in XMM registers like the ABI says.
After logging an error we attempt to recover by using FP0 instead
of an XMM register. But if the return needs two registers, we may have
already used FP0. So if the register we were supposed to copy to is
XMM1, copy to FP1 in the recovery instead.

This seems to fix the assertion/crash in PR44413.
2019-12-31 00:16:13 -08:00
Craig Topper 47a2fd2df4 [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.
If only the sign bit is demanded, and the LHS is all zeroes, then
we can bypass the PCMPGT.
2019-12-30 10:50:25 -08:00
Craig Topper 266cd7717c [X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFC
These are implemented slightly more efficiently than comparing
to 1 in the case that the value is more than 64 bits.
2019-12-29 17:35:49 -08:00
Craig Topper b2f19320dc [X86] Use isOneConstant to simplify some code. NFC 2019-12-29 16:53:38 -08:00
Craig Topper 599d070910 [X86] Remove dyn_casts to ConstantSDNode for operand 1 of X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations.
These nodes should only ever be formed with an i8 TargetConstant
so we don't need to check for it to be a constant. It's also
always 8-bits so we don't need to use APInt compare functions.
2019-12-29 16:53:38 -08:00
Craig Topper a5c96e326a [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only targets.
We had a Custom operation action for v4i32 on SSE1. But since
v4i32 isn't legal until SSE2 this was not what was intended. The
code that get executed was intended for op legalization and
creates a bunch of v4i32 nodes that all end up scalarized.
2019-12-28 23:11:48 -08:00
Craig Topper ae321faeed [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in LowerUINT_TO_FP_i32. NFCI 2019-12-28 21:49:22 -08:00
Craig Topper fca4736874 [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to v4i32->v4f32 under avx512.
With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With
avx512f we get v16i32->v16f32 instructions which we can use to
emulate v4i32->v4f32.
2019-12-27 00:28:44 -08:00
Craig Topper 20aab49492 [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization. 2019-12-27 00:28:44 -08:00
Fangrui Song 7a7334663c Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type!
  i32 (i32*)* @llvm.setjmp

*wipes tear*
2019-12-27 00:00:14 -08:00
Craig Topper ecbaf152f8 [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without avx512vl. Similar for vXi64 on avx512dq without avx512vl.
Summary:
Previously we did this with isel patterns that used garbage in
the widened part of the source. But that's not valid for strictfp.
So now we custom widen and use zeroes for the widened elemens for
strictfp.

This replaces D71864.

Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3

Reviewed By: pengfei

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71879
2019-12-26 22:04:40 -08:00
Craig Topper 50fb3957c1 [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.
For non-strict, generic type legalization will take care of this,
but that doesn't happen currently for strict nodes.
2019-12-26 21:45:18 -08:00
Fangrui Song c4a97b64e3 [X86] Fix -Wmisleading-indentation after D71892 2019-12-26 21:41:16 -08:00
Craig Topper 53ee806d93 [X86][FPEnv] Promote some float strictfp operations to double on i686-pc-windows-msvc to match what we do for non-strict.
The float libcalls are inlined in MSVC's math header where they
just cast to double and use the double libcall. Do the same when
we emit libcalls.
2019-12-26 20:22:24 -08:00
Craig Topper a5d266b9cf [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.
I believe the algorithm we use for non-strict is exception safe
for strict. The fsub won't generate any exceptions. After it we
will have an exact version of the i32 integer in a double. Then
we just round it to f32. That rounding will generate a precision
exception if it can't be represented exactly.
2019-12-26 19:10:26 -08:00
Liu, Chen3 1a7b69f5dd add custom operation for strict fpextend/fpround
Differential Revision: https://reviews.llvm.org/D71892
2019-12-27 08:28:33 +08:00
Eric Christopher 1584e2f987 Remove SrcVT only used in an assert and propagate query. 2019-12-26 15:28:32 -08:00
Craig Topper f953882113 [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl.
Previously we widened these through isel patterns, but that
didn't work for STRICT_ nodes. Those need to be padded with
zeroes in the upper bits which is harder to do in isel patterns.
2019-12-26 14:46:56 -08:00
Craig Topper 90ff34e6ab [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, but not AVX512VL.
Previously we were widening with isel patterns, but that wasn't
exception safe for strict FP. So now we widen to v4i32->v4f64
during type legalization. And then let op legalization further
widen to v8i32->v8f64.

The vec_int_to_fp.ll changes are caused by us no longer narrowing
extracts of strict_uint_to_fp to the v4i32->v2f64 instruction
without AVX512VL only to have isel rewiden it. Now we just keep
it wide throughout. So we don't have an opportunity to narrow
the load.
2019-12-26 13:40:56 -08:00
Craig Topper bb0138729b [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, but not avx512vl.
AVX512F added instruction for vector fp_to_uint conversions. With
AVX512VL we can use a specific instruction that does v2f64->v4i32 with
zeroes in the 2 extra elements. For non-strict nodes without AVX512VL
we relied on type legalization to turn it to v4f64->v4i32 which would
later be widened by op legalization to v8f64->v8i32. But type legalization
doesn't currently widen strict nodes since it doesn't know how to
safely and efficiently pad the extra elements. But for X86 we know
padding with zeroes is safe and efficient so do that ourselves.
2019-12-26 12:42:27 -08:00
Craig Topper c91bf72e2c [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since the AVX512DQ+AVX512VL code is very similar in both. NFC 2019-12-26 08:58:34 -08:00
Craig Topper 4e6b0dd681 [X86] Add custom lowering for v2i64->v2f32 strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets.
With avx512dq+avx512vl we have instruction that implements this and
places zeroes in the upper 64-bits of the destination xmm register.
2019-12-26 08:58:34 -08:00
Wang, Pengfei 472bded3ed [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend
Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend

Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71871
2019-12-26 08:15:13 +08:00
Craig Topper c5b4a2386b [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint v2i1->v2f64 on targets with AVX512F, but not AVX512VL.
In the worst case, this requires a 128-bit move instruction to
implicitly zero the upper bits. In the common case, we should
recognize the producing instruction already zeroed the upper bits.
2019-12-25 10:46:00 -08:00
Craig Topper 2498d88259 [X86] Merge together some common code in LowerFP_TO_INT now that we have STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC 2019-12-25 09:57:27 -08:00
Liu, Chen3 8304781cae Add missing strict_fp_to_int
Differential Revision: https://reviews.llvm.org/D71867
2019-12-25 16:10:10 +08:00
Craig Topper c06e53119b [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit targets with avx512dq and avx512vl instructions.
On 32-bit targets we can't use the scalar instruction so we
insert the scalar into a vector and use packed conversions.
Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid
some complexity creating target specific ISD opcodes for
v4f32->v2i64. But this causes extra vzeroupper instructions and
possibly frequency throttling on Intel CPUs.

This patch changes this to create a 128-bit vector and uses a
target specific ISD opcode if needed.
2019-12-24 11:20:10 -08:00
Craig Topper a21beccea2 [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.
Differential Revision: https://reviews.llvm.org/D71850
2019-12-24 10:07:04 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Craig Topper de2378b4f3 [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables before a later use.
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.

A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.

To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.

I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.

Differential Revision: https://reviews.llvm.org/D71736
2019-12-20 11:24:45 -08:00
Craig Topper bf507d4259 [X86] Make EmitCmp into a static function and explicitly return chain result for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.

Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
2019-12-19 23:03:15 -08:00
Craig Topper 9b6fafa399 [X86] Directly call EmitTest in two places instead of creating a null constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.

So just skip all that and go directly to EmitTest.
2019-12-19 23:03:06 -08:00
Liu, Chen3 2f932b5729 Enable STRICT_FP_TO_SINT/UINT on X86 backend
This patch is mainly for custom lowering the vector operation.

Differential Revision: https://reviews.llvm.org/D71592
2019-12-19 14:49:13 +08:00
Wang, Pengfei 1949235d13 [X86] Add strict fma support
Summary: Add strict fma support

Reviewers: craig.topper, RKSimon, LiuChen3

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71604
2019-12-18 11:44:00 +08:00
Craig Topper 004fdbe041 [X86] Manually format some setOperationAction calls to line up arguments to improve readability. NFC 2019-12-17 16:11:31 -08:00
Kevin P. Neal b1d8576b0a This adds constrained intrinsics for the signed and unsigned conversions
of integers to floating point.

This includes some of Craig Topper's changes for promotion support from
D71130.

Differential Revision: https://reviews.llvm.org/D69275
2019-12-17 10:06:51 -05:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sanjay Patel cdf5cfea8e Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit d1f0bdf2d2.
The patch can cause infinite loops in DAGCombiner.
2019-12-11 16:56:58 -05:00
Sanjay Patel d1f0bdf2d2 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.

We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().

This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-11 13:30:39 -05:00
Craig Topper 935d41e4bd [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256
This is an improvement to 88dacbd436
2019-12-10 17:29:02 -08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Craig Topper 88dacbd436 [X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256.
This reverts 3e1aee2ba7 in favor
of a different approach.

Scalarizing isn't great codegen, but making the type illegal was
interfering with k constraint in inline assembly.
2019-12-10 15:07:55 -08:00
Liu, Chen3 bbf7860b93 add support for strict operation fpextend/fpround/fsqrt on X86 backend
Differential Revision: https://reviews.llvm.org/D71184
2019-12-10 09:04:28 +08:00
Amara Emerson 84fdd9d7a5 [X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.
The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.

Differential Revision: https://reviews.llvm.org/D71095
2019-12-06 14:44:56 -08:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Reid Kleckner c089f02898 [X86] Don't setup and teardown memory for a musttail call
Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.

This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.

If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.

In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71097
2019-12-06 12:58:54 -08:00
Craig Topper 8267be2995 [X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we explicitly return the chain instead of calling getValue on the single SDValue.
We shouldn't assume that the returned result can be used to get
the other result.

This is prep-work for strict FP where we will also need to pass
the chain result along in more cases.
2019-12-05 17:54:21 -08:00
Liu, Chen3 3041434450 Add strict fp support for instructions fadd/fsub/fmul/fdiv
Differential Revision: https://reviews.llvm.org/D68757
2019-12-06 09:44:33 +08:00
Craig Topper 3d43c73f26 [X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC
I suspect this became unnecessary after r354161. Prior to that
we may have been going through the default expansion of FP_TO_UINT
on 64-bit targets and then ending up back in Custom X86 handling
to handle the FP_TO_SINT for it. Now we just Custom handle the
FP_TO_UINT directly. We already need to handle it for 32-bit mode
during type legalization so we wouldn't save any code by using
the default expansion on 64-bit.
2019-12-04 17:58:10 -08:00