Commit Graph

7511 Commits

Author SHA1 Message Date
Kazu Hirata 913515e465 [Target] Use llvm::is_contained (NFC) 2020-12-13 19:35:10 -08:00
Simon Pilgrim d5c434d7dd [X86][SSE] combineX86ShufflesRecursively - add basic handling for combining shuffles of different widths (PR45974)
If a faux shuffle uses smaller shuffle inputs, try to recursively combine with those inputs directly instead of widening them immediately. Then widen all smaller inputs at the bottom of the recursion.

This will still mean we're generating nodes on the fly (PR45974) even if we don't combine to a new shuffle but it does help AVX2+ targets combine across xmm/ymm/zmm types, mainly as variable shuffles.
2020-12-13 17:18:07 +00:00
Simon Pilgrim 47321c311b [X86][SSE] combineReductionToHorizontal - add vXi8 ISD::MUL reduction handling (PR39709)
Default expansion leads to repeated extensions/truncations to/from vXi16 which shuffle combining and demanded elts can't completely unravel.

Better just to promote (any_extend) the input and perform a vXi16 reduction.

We'll be able to remove a lot of this if we ever get decent legalization support for reduction intrinsics in SelectionDAG.
2020-12-13 15:22:54 +00:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Saleem Abdulrasool ee74d1b420 X86: use a data driven configuration of Windows x86 libcalls (NFC)
Rather than creating a series of associated calls and ensuring that
everything is lined up, use a table driven approach that ensures that
they two always stay in sync.
2020-12-09 22:49:11 +00:00
Simon Pilgrim 24184dbb82 [X86] Fold CONCAT(VPERMV3(X,Y,M0),VPERMV3(Z,W,M1)) -> VPERMV3(CONCAT(X,Z),CONCAT(Y,W),CONCAT(M0,M1))
Further prep work toward supporting different subvector sizes in combineX86ShufflesRecursively
2020-12-09 14:29:32 +00:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Harald van Dijk 29c8ea6f1a
[X86] Handle localdynamic TLS model in x32 mode
D92346 added TLS_(base_)addrX32 to handle TLS in x32 mode, but missed the
different TLS models. This diff fixes the logic for the local dynamic model
where `RAX` was used when `EAX` should be, and extends the tests to cover
all four TLS models.

Fixes https://bugs.llvm.org/show_bug.cgi?id=26472.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92737
2020-12-08 21:06:00 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Simon Pilgrim 0101fb73de [X86] Fold MOVMSK(ICMP_SGT(X,-1)) -> NOT(MOVMSK(X)))
Noticed while triaging PR37506
2020-12-06 17:56:41 +00:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Simon Pilgrim b96a521077 [X86] LowerRotate - enable custom lowering of ROTL/ROTR vXi16 on VBMI2 targets. 2020-12-04 12:16:59 +00:00
Simon Pilgrim d073805be6 [X86] LowerRotate - VBMI2 targets can lower vXi16 rotates using funnel shifts.
Ideally we'd do this inside DAGCombine but until we can make the FSHL/FSHR opcodes legal for VBMI2 it won't help us.
2020-12-04 11:29:23 +00:00
Simon Pilgrim df1ddc4234 [X86] Let VBMI2 non-VLX targets still use funnel shifts instructions 2020-12-04 11:06:43 +00:00
Simon Pilgrim 8eedd18fcb [X86] Remove unnecessary bitcast. NFC.
The X86ISD::SUBV_BROADCAST node is already VT
2020-12-04 09:44:57 +00:00
Xiang1 Zhang f2e2924463 [X86] Unbind the ebx with GOT address in regcall calling convention
No register can be allocated for indirect call when it use regcall calling
convention and passed 5/5+ args.
For example:
call vreg (ag1, ag2, ag3, ag4, ag5, ...) --> 5 regs (EAX, ECX, EDX, ESI, EDI)
used for pass args, 1 reg (EBX )used for hold GOT point, so no regs can be
allocated to vreg.

The Intel386 architecture provides 8 general purpose 32-bit registers. RA
mostly use 6 of them (EAX, EBX, ECX, EDX, ESI, EDI). 5 of this regs can be
used to pass function arguments (EAX, ECX, EDX, ESI, EDI).
EBX used to hold the GOT pointer when making function calls via the PLT.
ESP and EBP usually be "reserved" in register allocation.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D91020
2020-12-04 10:00:13 +08:00
Harald van Dijk c9be4ef184
[X86] Add TLS_(base_)addrX32 for X32 mode
LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and
TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit
TLS addresses in 64-bit mode, which were not yet handled. This adds
TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use
tls32(base)addr rather than tls64(base)addr, and then restricts
TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92346
2020-12-02 22:20:36 +00:00
Simon Pilgrim f019362329 [X86] EltsFromConsecutiveLoads - remove old FIXME comment. NFC.
Its unlikely an undef element in a zero vector will be any use.
2020-12-02 17:21:41 +00:00
Simon Pilgrim 3900ec6f05 [X86] combineX86ShufflesRecursively - remove old FIXME comment. NFC.
Its unlikely an undef element in a zero vector will be any use, and SimplifyDemandedVectorElts now calls combineX86ShufflesRecursively so its unlikely we actually have a dependency on these specific elements.
2020-12-02 16:29:38 +00:00
Simon Pilgrim 0dab7ecc5d [X86] EltsFromConsecutiveLoads - pull out repeated NumLoadedElts. NFCI. 2020-12-02 16:29:37 +00:00
Simon Pilgrim 1b209ff9e3 [DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.
2020-12-01 14:25:29 +00:00
Simon Pilgrim 6dbd0d36a1 [DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Differential Revision: https://reviews.llvm.org/D91876
2020-12-01 11:56:26 +00:00
Harald van Dijk cdac34bd47
[X86] Zero-extend pointers to i64 for x86_64
For LP64 mode, this has no effect as pointers are already 64 bits.
For ILP32 mode (x32), this extension is specified by the ABI.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D91338
2020-11-30 18:51:23 +00:00
Simon Pilgrim 83d79ca5bf [X86][AVX512] Only lower to VPALIGNR if we have BWI (PR48322) 2020-11-30 10:51:24 +00:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
Simon Pilgrim 8057ebf4a0 Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal"
This reverts commit 12d59b696b.

Prematurely pushed this to trunk
2020-11-26 15:07:45 +00:00
Simon Pilgrim 12d59b696b [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.
2020-11-26 14:47:28 +00:00
Simon Pilgrim 791040cd8b [DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX
This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.
2020-11-22 13:02:27 +00:00
Simon Pilgrim 0341029bb4 [X86][AVX] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT v8i32/v4i64 lowering
Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on SSE targets.

Enable custom lowering for v8i32/v4i64 and generalize the 128-bit lowering code for any vector size - this also lets us use the slightly cheaper codegen for icmp_ugt instead of umin/umax.
2020-11-20 18:16:44 +00:00
Craig Topper a7eae62a42 [SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846
2020-11-20 10:06:53 -08:00
Simon Pilgrim 09a081f221 [X86][SSE] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT custom lowering
Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on pre-SSE4 targets.

We're still using X86ISD::BLENDV on some AVX targets as we don't do custom lowering for >= 256-bit vectors.

Really this (and combineVSelectWithAllOnesOrZeros) needs moving to DAGCombiner, but pre-SSE42 we see the vXi64 comparison type as a 2 x 32-bits result so we can't just rely on ComputeNumSignBits to give us the 'all bits' result we need.
2020-11-20 16:53:01 +00:00
Simon Pilgrim 14ae02fb33 [X86][AVX] Only share broadcasts of different widths from the same SDValue of the same SDNode (PR48215)
D57663 allowed us to reuse broadcasts of the same scalar value by extracting low subvectors from the widest type.

Unfortunately we weren't ensuring the broadcasts were from the same SDValue, just the same SDNode - which failed on multiple-value nodes like ISD::SDIVREM

FYI: I intend to request this be merged into the 11.x release branch.

Differential Revision: https://reviews.llvm.org/D91709
2020-11-19 12:15:18 +00:00
Craig Topper f0b0bab34d [X86] Use GF2P8AFFINEQB to implement vector bitreverse.
We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size.

We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW.

I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D91515
2020-11-17 23:49:06 -08:00
Craig Topper 57c0c4a275 [X86] Fix crash with i64 bitreverse on 32-bit targets with XOP.
We unconditionally marked i64 as Custom, but did not install a
handler in ReplaceNodeResults when i64 isn't legal type. This
leads to ReplaceNodeResults asserting.

We have two options to fix this. Only mark i64 as Custom on
64-bit targets and let it expand to two i32 bitreverses which
each need a VPPERM. Or the other option is to add the Custom
handling to ReplaceNodeResults. This is what I went with.
2020-11-15 19:02:34 -08:00
Craig Topper 114f044640 [X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4.
This was a mistake introduced in D91294. I'm not sure how to
exercise this with the existing code, but I hit it while trying
some follow up experiments.
2020-11-12 21:48:45 -08:00
Craig Topper a4124e455e [X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte
We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0.

I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue.

Should fix PR48147.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91294
2020-11-12 21:28:18 -08:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Gaurav Jain 3726b14428 [NFC] Use [MC]Register for x86 target
Differential Revision: https://reviews.llvm.org/D91161
2020-11-10 15:49:39 -08:00
Craig Topper f40925aa8b [X86] Improve lowering of fptoui
Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor.

Patch by Layton Kifer!

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D90658
2020-11-07 23:50:03 -08:00
Simon Pilgrim 9e406ee808 [X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI.
Fixes a number of cppcheck remarks.
2020-10-31 12:16:49 +00:00
serge-sans-paille 0f60bcc36c [stack-clash] Fix probing of dynamic alloca
- Perform the probing in the correct direction.
  Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924

- The first touch on a dynamic alloca cannot use a mov because it clobbers
  existing space. Use a xor 0 instead

Differential Revision: https://reviews.llvm.org/D90216
2020-10-30 15:34:00 +01:00
Benjamin Kramer 35f7cbf9df [X86] Don't crash on CVTPS2PH with wide vector inputs. 2020-10-27 14:42:02 +01:00
Craig Topper 63ba82ed00 [X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS. 2020-10-25 12:52:56 -07:00
Craig Topper 2ed16aa66f [X86] Use TargetConstant instead of Constant for operands to X86vaarg64. 2020-10-25 12:24:59 -07:00
Craig Topper a222d832d5 [X86] Use TargetConstant for FPDiff with X86::TC_RETURN.
It's required to be a constant and can never be in a register so
make it explicit.
2020-10-25 00:29:11 -07:00
Simon Pilgrim ce356e1546 [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim 936ef89ebe [X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI. 2020-10-23 12:35:27 +01:00
Simon Pilgrim 794dc7ad26 [CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger().
Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger.

All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.
2020-10-22 14:27:42 +01:00
Tianqing Wang be39a6fe6f [X86] Add User Interrupts(UINTR) instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89301
2020-10-22 17:33:07 +08:00
Xiang1 Zhang 7c3fea7721 [X86] Support customizing stack protector guard
Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D88631
2020-10-22 10:08:14 +08:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Wang, Pengfei 3a85472af2 [X86] Fix assert fail when element type is i1.
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89096
2020-10-20 09:26:32 +08:00
David Sherwood 47f2dc7e5f [SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101
2020-10-15 09:01:21 +01:00
Craig Topper 1687a8d83b [X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.

Alternative to D89200

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89222
2020-10-12 23:18:29 -07:00
Simon Pilgrim 913d7a110e [X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16.
This is my first LLVM patch, so please tell me if there are any process issues.

The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one.

We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this.

Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case.

Patch By: @TomHender (Tom Hender) ActuallyaDeviloper

Differential Revision: https://reviews.llvm.org/D87236
2020-10-11 11:21:23 +01:00
Craig Topper 9895327914 [X86] Redefine X86ISD::PEXTRB/W and X86ISD::PINSRB/PINSRW to use a i8 TargetConstant for the immediate instead of a ptr constant.
This is more consistent with other target specific ISD opcodes that
require immediates.
2020-10-10 21:50:58 -07:00
Craig Topper 375849518d [X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant.
The bextri intrinsic has a ImmArg attribute which will be converted
in SelectionDAG using TargetConstant. We previously converted this
to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits
on it.

But while trying to decide if D89178 was safe, I realized that
this conversion of TargetConstant to Constant would be one case
where that would break.

So this patch adds a new opcode specifically for the immediate case.
And then teaches computeKnownBits and SimplifyDemandedBits to also
handle it, but not try to SimplifyDemandedBits on it. To make up
for that, I immediately masked the constant to 16 bits when
converting from the intrinsic node to the X86ISD node.
2020-10-10 19:18:06 -07:00
Joao Moreira e0b89df2e0 [X86] Check if call is indirect before emitting NT_CALL
The notrack prefix is a relaxation of CET policies which makes it possible to indirectly call targets which do not have an ENDBR instruction in the landing address. To emit a call with this prefix, the special attribute "nocf_check" is used. When used as a function attribute, a CallInst targeting the respective function will return true for the method "doesNoCfCheck()", no matter if it is a direct call (and such should remain like this, as the information that the to-be-called function won't perform control-flow checks is useful in other contexts). Yet, when emitting an X86ISD::NT_CALL, the respective CallInst should be verified for its indirection, allowing that the prefixed calls are only emitted in the right situations.

Update the respective testing unit to also verify for direct calls to functions with ''nocf_check'' attributes.

The bug can also be reproduced through compiling the following C code using the -fcf-protection=full flag.

int __attribute__((nocf_check)) foo(int a) {};

int main() {
  foo(42);
}

Differential Revision: https://reviews.llvm.org/D87320
2020-10-09 15:54:23 -07:00
Craig Topper f34bb06935 [X86] When expanding LCMPXCHG16B_NO_RBX in EmitInstrWithCustomInserter, directly copy address operands instead of going through X86AddressMode.
I suspect getAddressFromInstr and addFullAddress are not handling
all addresses cases properly based on a report from MaskRay.

So just copy the operands directly. This should be more efficient
anyway.
2020-10-09 11:55:24 -07:00
Fangrui Song e36a41b3cf [X86] Fix some clang-tidy bugprone-argument-comment issues 2020-10-08 15:26:50 -07:00
Craig Topper 68e1a8d207 [X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel
We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as
the frame pointer. We previously checked for this during type
legalization, but that's too early to know for sure if the base
pointer is needed.

This patch adds a new pseudo instruction to emit from isel that
uses a virtual register for the RBX input. Then we use the custom
inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base
pointer or LCMPXCHG16B_SAVE_RBX if it is.

Fixes PR42064.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D88808
2020-10-07 17:00:43 -07:00
Simon Pilgrim 6c7d713cf5 [X86][SSE] combineX86ShuffleChain add 'CanonicalizeShuffleInput' helper. NFCI.
As part of PR45974, we're getting closer to not creating 'padded' vectors on-the-fly in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.

At the moment combineX86ShuffleChain just has to bitcast an input to the correct shuffle type, but eventually we'll need to pad them as well. So, move the bitcast into a 'CanonicalizeShuffleInput helper for now, making the diff for future padding support a lot smaller.
2020-10-06 17:47:24 +01:00
Craig Topper 4da4e7cb20 [X86] Remove X86ISD::LCMPXCHG8_SAVE_EBX_DAG and LCMPXCHG8B_SAVE_EBX pseudo instruction
This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B.

Split from D88808

Differential Revision: https://reviews.llvm.org/D88853
2020-10-05 15:03:07 -07:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Simon Pilgrim 0ac210e580 [X86] isTargetShuffleEquivalent - merge duplicate array accesses. NFCI. 2020-10-05 17:22:14 +01:00
Craig Topper 4b38ceb0eb [X86] Remove MWAITX_SAVE_EBX pseudo instruction. Always save/restore the full %rbx register even in gnux32.
ebx/rbx only needs to be saved when 64-bit registers are supported
anyway. It should be fine to save/restore the whole rbx register
even in gnux32 where the base is technically just ebx.

This matches what we do for cmpxchg16b where rbx is saved/restored
regardless of gnux32.
2020-10-04 16:28:15 -07:00
Simon Pilgrim e4e5c42896 [X86][SSE] isTargetShuffleEquivalent - ensure shuffle inputs are the correct size.
Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.

This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.
2020-10-04 15:32:05 +01:00
Craig Topper a7e45ea30d [X86] Add memory operand to AESENC/AESDEC Key Locker instructions.
This removes FIXMEs from selectAddr.
2020-10-03 21:42:16 -07:00
Craig Topper 39fc4a0b0a [X86] Move ENCODEKEY128/256 handling from lowering to selection.
We should avoid emitting MachineSDNodes from lowering.

We can use the the implicit def handling in InstrEmitter to avoid
manually copying from each xmm result register. We only need to
manually emit the copies for the implicit uses.
2020-10-03 18:44:53 -07:00
Craig Topper 7f3da48885 [X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel. 2020-10-03 18:44:53 -07:00
Craig Topper adccc0bfa3 [X86] Add X86ISD opcodes for the Key Locker AESENC*KL and AESDEC*KL instructions
Instead of emitting MachineSDNodes during lowering, emit X86ISD
opcodes. These opcodes will either be selected by tablegen
patterns or custom selection code.

Emitting MachineSDNodes during lowering is uncommon so this makes
things more consistent. It also allows selectAddr to be called to
perform address matching during instruction selection.

I had trouble getting tablegen to accept XMM0-XMM7 as results in
an isel pattern for the WIDE instructions so I had to use custom
instruction selection.
2020-10-03 16:55:19 -07:00
serge-sans-paille 9573c9f2a3 Fix limit behavior of dynamic alloca
When the allocation size is 0, we shouldn't probe. Within [1,  PAGE_SIZE], we
should probe once etc.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47657

Differential Revision: https://reviews.llvm.org/D88548
2020-10-02 11:10:02 +02:00
Craig Topper d1d7fc9832 [X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.

Fixes more case for PR47049
2020-09-30 13:50:52 -07:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Craig Topper 618a890b72 [X86] Increase the depth threshold required to form VPERMI2W/VPERMI2B in shuffle combining
These instructions are implemented with two port 5 uops and one port 015 uop so they are more complicated that most shuffles.

This patch increases the depth threshold for when we form them during shuffle combining to try to limit increasing the number of uops especially on port 5.

Differential Revision: https://reviews.llvm.org/D88503
2020-09-29 18:37:23 -07:00
Craig Topper 82da0cabb9 [X86] Add computeKnownBits support for PEXT.
The number of zeros in the mask provides a lower bound on the number
of leading zeros in the result.
2020-09-28 22:54:07 -07:00
Craig Topper e53196b1e8 [X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.
We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Differential Revision: https://reviews.llvm.org/D87883
2020-09-28 14:21:30 -07:00
Simon Pilgrim e0820d87e3 [X86] Flip isShuffleEquivalent argument order to match isTargetShuffleEquivalent
A while ago, we converted isShuffleEquivalent/isTargetShuffleEquivalent to both use IsElementEquivalent internally.

This allows us to make the shuffle args optional like isTargetShuffleEquivalent and update foldShuffleOfHorizOp to use isShuffleEquivalent (which it should as its using a ISD::VECTOR_SHUFFLE mask).
2020-09-28 12:53:56 +01:00
Simon Pilgrim 6b5198f06b [X86] Simplify broadcast mask detection with isUndefOrEqual helper.
Add an additional isUndefOrEqual variant that matches an entire mask, not just a single value.
2020-09-28 12:53:56 +01:00
Simon Pilgrim 283036394e [X86][SSE] combineVectorTruncation - enable (pre-SSSE3) vXi16->vXi8 truncation.
Shuffle combining can now handle this output, and by performing this early in combineVectorTruncation we avoid a scalarization that caused a regression on D87502.
2020-09-24 15:51:36 +01:00
Craig Topper f21f835ee8 [X86] Improve demanded bits for X86ISD::BEXTR.
If the control is constant we can figure out exactly which bits
of the input are demanded.

Differential Revision: https://reviews.llvm.org/D88072
2020-09-23 10:51:02 -07:00
Craig Topper a74b1faba2 [X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization.
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.

Differential Revision: https://reviews.llvm.org/D87863
2020-09-20 13:54:20 -07:00
Craig Topper 4e8c028158 [X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode
If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores.

Differential Revision: https://reviews.llvm.org/D87862
2020-09-20 13:46:59 -07:00
Simon Pilgrim 0bfeede669 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X) 2020-09-20 18:39:12 +01:00
Simon Pilgrim bb0078e591 [X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 18:39:12 +01:00
Simon Pilgrim 15c8306056 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 16:33:02 +01:00
Simon Pilgrim a0c8793ce6 [X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets.
Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.
2020-09-20 16:05:10 +01:00
Simon Pilgrim 2b634a9d0e [X86] Rename getExtendInVec to getEXTEND_VECTOR_INREG. NFCI.
Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering/combining.
2020-09-20 15:19:39 +01:00
Simon Pilgrim 91720ee561 [X86] combineX86ShufflesRecursively - fix use after move warning. NFCI.
After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.
2020-09-20 14:06:50 +01:00
Simon Pilgrim e17686ae60 [X86] Rename combineExtInVec to combineEXTEND_VECTOR_INREG. NFCI.
Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering.
2020-09-20 12:16:00 +01:00
Craig Topper 721d57f952 [X86] Return from SimplifyDemandedBitsForTargetNode after calculating known bits for VSHLI/VSRAI/VSRLI.
We were breaking out of the switch which falls into the default
implementation of SimplifyDemandedBitsForTargetNode which is a
wrapper around computeKnownBits. So we end up doing the recursion
and known bits calculation all over again. Instead we should return
with the known bits we calculated in the switch.
2020-09-18 23:57:01 -07:00
Craig Topper 58ecbbcdcd [X86] Fix copy paste mistake in @ccnp flag.
We were treating @ccp and @ccnp the same.
2020-09-18 21:28:01 -07:00
Simon Pilgrim 4ebd30722a [X86][AVX] lowerBuildVectorAsBroadcast - improve BROADCASTM lowering on non-VLX targets
Broadcast to a ZMM type then extract the low subvector.
2020-09-18 19:52:02 +01:00
Simon Pilgrim ceadd98c2f [X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets
We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.
2020-09-18 16:59:57 +01:00
Craig Topper 3783d3bc7b [X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber
The register class picked will be the RFP80 register class which has a f80 VT. The code in SelectionDAGBuilder that generates copies around inline assembly doesn't know how to handle an integer and floating point type of different bit widths.

The test case is derived from this https://godbolt.org/z/sEa659 which gcc accepts but clang crashes on. This patch just gives a more graceful error. I'm not sure if the single element struct case is special in gcc. Adding another field to the struct makes gcc reject it. If we want to support this correctly I think we need a change in the frontend to give us the true element type. Right now the frontend just realizes the constraint can take a memory argument so creates an integer type of the same size and bitcasts.

Differential Revision: https://reviews.llvm.org/D87485
2020-09-17 11:26:50 -07:00
Simon Pilgrim b2c931eff3 [X86] EmitInstrWithCustomInserter - remove redundant getDebugLoc() calls. NFCI.
Use the same DebugLoc that is called at the top of the method.

Fixes some Wshadow static analyzer warnings.
2020-09-16 16:29:56 +01:00
Simon Pilgrim aa4b0b755a [X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle.
Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup.

Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.
2020-09-16 16:08:31 +01:00
Craig Topper 05134877e6 [X86] Use Align in reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore. Correct pointer info.
If we offset the pointer, we also need to offset the pointer info

Differential Revision: https://reviews.llvm.org/D87593
2020-09-15 11:22:02 -07:00