Commit Graph

19987 Commits

Author SHA1 Message Date
Simon Pilgrim ff307c8120 [X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression
This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN.

Differential Revision: https://reviews.llvm.org/D74231
2020-02-12 16:07:27 +00:00
Simon Pilgrim 9eb426c88c [TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.

This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.

It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.

Differential Revision: https://reviews.llvm.org/D74221
2020-02-12 11:51:42 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Craig Topper 746395a446 [X86] Remove unnecessary hasSideEffects = 0, mayLoad = 1 from an instruction with a pattern. NFC 2020-02-11 23:26:29 -08:00
Craig Topper 3988b7046a [X86] Correct the predicate on some patterns for 128 and 256 EVEX versions of VCVTPS2PH.
These should require AVX512VL not AVX512F. The legacy VEX patterns
will match first unless AVX512VL is enabled so this doesn't cause
a functional issue.
2020-02-11 23:26:29 -08:00
Craig Topper 0daf9b8e41 [X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.

Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
2020-02-11 22:30:04 -08:00
Craig Topper 846d0ac43e [X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512
We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen.

Differential Revision: https://reviews.llvm.org/D74431
2020-02-11 14:36:29 -08:00
Craig Topper d7de7ac370 [X86] Raise the latency for VectorImul from 4 to 5 in Skylake scheduler models
Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel.

This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis.
It also matches Agner Fog.

Differential Revision: https://reviews.llvm.org/D74357
2020-02-11 11:24:25 -08:00
Nikita Popov 5eb19bf4a2 [X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539)
Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539.
As discussed there, this pass makes some overly optimistic
assumptions, as it does not have access to actual branch weights.

This patch makes the computation of the depth of the optimized cmov
more conservative, by assuming a distribution of 75/25 rather than
50/50 and placing the weights to get the more conservative result
(larger depth). The fully conservative choice would be
std::max(TrueOpDepth, FalseOpDepth), but that would break at least
one existing test (which may or may not be an issue in practice).

Differential Revision: https://reviews.llvm.org/D74155
2020-02-11 17:33:11 +01:00
Eric Astor 8d5bf0422b [ms] [llvm-ml] Add support for attempted register parsing
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.

MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.

Reviewers: thakis

Reviewed By: thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73486
2020-02-11 10:45:33 -05:00
Simon Pilgrim fa620fc8e2 [X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC. 2020-02-11 13:37:57 +00:00
Simon Pilgrim 11c16e7159 [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.
2020-02-11 12:21:03 +00:00
Craig Topper 798305d29b [X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns.
We need to use vector instructions for these operations. Previously
we handled this with isel patterns that used extra instructions
and copies to handle the the conversions.

Now we use custom lowering to emit the conversions. This allows
them to be pattern matched and optimized on their own. For
example we can now emit vpextrw to store the result if its going
directly to memory.

I've forced the upper elements to VCVTPHS2PS to zero to keep some
code similar. Zeroes will be needed for strictfp. I've added a
DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra
instructions in between to be closer to the previous codegen.

This is a step towards strictfp support for f16 conversions.
2020-02-10 22:01:48 -08:00
Simon Pilgrim f319074824 [X86] combineConcatVectorOps - combine X86ISD::PACKSS ops 2020-02-10 17:48:02 +00:00
Simon Pilgrim 74c0f98cf5 [X86] combineConcatVectorOps - combine X86ISD::VPERMI ops 2020-02-10 17:48:01 +00:00
Simon Pilgrim 2463b8c97d [X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops
Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-10 16:59:09 +00:00
Simon Pilgrim 06617c4522 [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.
2020-02-10 16:16:56 +00:00
Simon Pilgrim 39eade73a5 Revert rGe82e17d4d4cac8b2df00094e80d5e1cb22795664 - [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
---
Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.
2020-02-10 12:14:26 +00:00
Djordje Todorovic 3a4dc577c9 [CSInfo] Fix the assertions regarding updating the CSInfo
The call site info was not updated correctly when deleting
corresponding call instructions.

Differential Revision: https://reviews.llvm.org/D73700
2020-02-10 10:55:06 +01:00
Djordje Todorovic 68908993eb [CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).

Differential Revision: https://reviews.llvm.org/D74122
2020-02-10 10:03:14 +01:00
Craig Topper 06ba969c9d [X86] Make (insert_vector_elt (v8i16 zerovec), i16 %x, 0) generate the same code as (v8i16 (build_vector %x, 0, 0, 0, 0, 0, 0, 0)).
Instead of using a insrw to element 0, use movzx and movd.

Same for v16i8.
2020-02-09 21:52:11 -08:00
Craig Topper 05d44204fa [X86] Use MOVZX instead of MOVSX in f16_to_fp isel patterns.
Using sign extend forces the adjacent element to either all zeros
or all ones. But all ones is a NAN. So that doesn't seem like a
great idea.

Trying to work on supporting this with strict FP where NAN would
definitely be bad.
2020-02-09 20:39:52 -08:00
Simon Pilgrim 29e646fe65 [X86] combineConcatVectorOps - combine VROTLI/VROTRI ops
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:50:10 +00:00
Craig Topper 656d66f5fc [X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16.
We were using MOV32r0 and an extract_subreg as an input. By using
custom isel we can move the extract_subreg to after the SBB instead
of on the input.
2020-02-09 13:19:35 -08:00
Craig Topper e1cbfecdb8 [X86] Add flag result VT to a MOV32r0 created in X86DAGToDAGISel::Select
The flag isn't used, but I believe this matches the MOV32r0 that
would be created by the table emitter. This should allow this node
to be CSEed with any others created by the table.
2020-02-09 13:19:21 -08:00
Simon Pilgrim e82e17d4d4 [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:15:03 +00:00
Craig Topper dd262222b4 [X86] Use MVT::i32 for the type of a MOV32r0 created in X86DAGToDAGISel::Select.
Not sure if this really matters. The VT isn't really used after
this point. At best it might affect CSE.
2020-02-09 11:57:42 -08:00
Craig Topper dbcc1392b3 [X86] Remove isel patterns that include a vselect/X86selects and a strict FP node.
A vselect+strictfp node is not equivalent to a masked operation.
The exceptions of the strictfp node are not masked by a vselect
after it so we can't match it to a masked operation.

We already had a hack in IsLegalToFold to prevent these patterns from
matching. This patch removes that hack and removes the patterns.
2020-02-09 11:45:54 -08:00
Simon Pilgrim 29621b2534 [X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI.
A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.
2020-02-09 18:35:50 +00:00
Simon Pilgrim 3ec6de07e9 Fix signed/unsigned warning. 2020-02-09 13:35:03 +00:00
Simon Pilgrim 644d56b432 [X86] Recognise ROTLI/ROTRI rotations as faux shuffles
Allows us to combine rotations with shuffles.

One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-09 12:25:49 +00:00
serge_sans_paille e67cbac812 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 10:42:45 +01:00
serge-sans-paille 4546211600 Revert "Support -fstack-clash-protection for x86"
This reverts commit 0fd51a4554.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354
2020-02-09 10:06:31 +01:00
serge_sans_paille 0fd51a4554 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 09:35:42 +01:00
Craig Topper e629674176 [X86] Add more scalar intrinsic instructions to isNonFoldablePartialRegisterLoad.
I think this covers most if not all of the scalar intrinsic
instructions.
2020-02-08 20:41:36 -08:00
Craig Topper 0152b106ae [X86] Add the recently added (V)CVTSS2SI/CVTSD2SI instructions used for LRINT/LLRINT to the load folding tables. 2020-02-08 17:54:48 -08:00
Craig Topper d643a39aba [X86] Use any_fadd/sub/mul/div/sqrt with the AVX512 scalar_*_patterns.
Making sure not to use them with patterns for masked instructions.

Also fix FMA patterns that were matching strict_fma+x86selects to
masked instructions.
2020-02-08 15:54:40 -08:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Simon Pilgrim 4aa7b9cc96 [X86] X86InstComments - add FMA4 comments
These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.
2020-02-08 17:02:00 +00:00
Simon Pilgrim 10417ad2e4 [X86] Standardize BROADCAST enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.
2020-02-08 16:55:00 +00:00
Simon Pilgrim 0ed79e9b8f [X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants
2020-02-08 14:54:44 +00:00
serge-sans-paille 658495e6ec Revert "Support -fstack-clash-protection for x86"
This reverts commit e229017732.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308
2020-02-08 14:26:22 +01:00
serge_sans_paille e229017732 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with better option
handling and more portable testing

Differential Revision: https://reviews.llvm.org/D68720
2020-02-08 13:31:52 +01:00
Benjamin Kramer e4230a9f6c ArrayRef'ize spillCalleeSavedRegisters. NFCI. 2020-02-08 12:19:23 +01:00
Simon Pilgrim 7f5b3fa73c [X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree
Peek through X86ISD::FRCP nodes to see if there is a negatible input.
2020-02-08 10:56:27 +00:00
Simon Pilgrim 4229f12a22 [TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC.
This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.
2020-02-08 08:55:51 +00:00
Nico Weber b03c3d8c62 Revert "Support -fstack-clash-protection for x86"
This reverts commit 4a1a0690ad.
Breaks tests on mac and win, see https://reviews.llvm.org/D68720
2020-02-07 14:49:38 -05:00
serge_sans_paille 4a1a0690ad Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with correct option
flags set.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-07 19:54:39 +01:00
Craig Topper 278578744a [X86] Handle SETB_C32r/SETB_C64r in flag copy lowering the same way we handle SBB
Previously we took the restored flag in a GPR, extended it 32 or 64 bits. Then used as an input to a sub from 0. This requires creating a zero extend and creating a 0.

This patch changes this to just use an ADD with 255 to restore the carry flag and keep the SETB_C32r/SETB_C64r. Exactly like we handle SBB which is what SETB becomes.

Differential Revision: https://reviews.llvm.org/D74152
2020-02-07 10:31:19 -08:00