llvm-project

Commit Graph

Author	SHA1	Message	Date
Kerry McLaughlin	ffbbfc76ca	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Gaurav Jain	3726b14428	[NFC] Use [MC]Register for x86 target Differential Revision: https://reviews.llvm.org/D91161	2020-11-10 15:49:39 -08:00
Craig Topper	f40925aa8b	[X86] Improve lowering of fptoui Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor. Patch by Layton Kifer! Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D90658	2020-11-07 23:50:03 -08:00
Simon Pilgrim	9e406ee808	[X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:49 +00:00
serge-sans-paille	0f60bcc36c	[stack-clash] Fix probing of dynamic alloca - Perform the probing in the correct direction. Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924 - The first touch on a dynamic alloca cannot use a mov because it clobbers existing space. Use a xor 0 instead Differential Revision: https://reviews.llvm.org/D90216	2020-10-30 15:34:00 +01:00
Benjamin Kramer	35f7cbf9df	[X86] Don't crash on CVTPS2PH with wide vector inputs.	2020-10-27 14:42:02 +01:00
Craig Topper	63ba82ed00	[X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS.	2020-10-25 12:52:56 -07:00
Craig Topper	2ed16aa66f	[X86] Use TargetConstant instead of Constant for operands to X86vaarg64.	2020-10-25 12:24:59 -07:00
Craig Topper	a222d832d5	[X86] Use TargetConstant for FPDiff with X86::TC_RETURN. It's required to be a constant and can never be in a register so make it explicit.	2020-10-25 00:29:11 -07:00
Simon Pilgrim	ce356e1546	[DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method. I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns. Differential Revision: https://reviews.llvm.org/D87930	2020-10-24 12:23:09 +01:00
Simon Pilgrim	936ef89ebe	[X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI.	2020-10-23 12:35:27 +01:00
Simon Pilgrim	794dc7ad26	[CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger(). Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger. All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.	2020-10-22 14:27:42 +01:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Xiang1 Zhang	7c3fea7721	[X86] Support customizing stack protector guard Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D88631	2020-10-22 10:08:14 +08:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Gaurav Jain	4634ad6c0b	[NFC] Set return type of getStackPointerRegisterToSaveRestore to Register Differential Revision: https://reviews.llvm.org/D89858	2020-10-21 16:19:38 -07:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Craig Topper	1687a8d83b	[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization. This passes existing X86 test but I'm not sure if it handles all type legalization cases it needs to. Alternative to D89200 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D89222	2020-10-12 23:18:29 -07:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
Craig Topper	9895327914	[X86] Redefine X86ISD::PEXTRB/W and X86ISD::PINSRB/PINSRW to use a i8 TargetConstant for the immediate instead of a ptr constant. This is more consistent with other target specific ISD opcodes that require immediates.	2020-10-10 21:50:58 -07:00
Craig Topper	375849518d	[X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant. The bextri intrinsic has a ImmArg attribute which will be converted in SelectionDAG using TargetConstant. We previously converted this to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits on it. But while trying to decide if D89178 was safe, I realized that this conversion of TargetConstant to Constant would be one case where that would break. So this patch adds a new opcode specifically for the immediate case. And then teaches computeKnownBits and SimplifyDemandedBits to also handle it, but not try to SimplifyDemandedBits on it. To make up for that, I immediately masked the constant to 16 bits when converting from the intrinsic node to the X86ISD node.	2020-10-10 19:18:06 -07:00
Joao Moreira	e0b89df2e0	[X86] Check if call is indirect before emitting NT_CALL The notrack prefix is a relaxation of CET policies which makes it possible to indirectly call targets which do not have an ENDBR instruction in the landing address. To emit a call with this prefix, the special attribute "nocf_check" is used. When used as a function attribute, a CallInst targeting the respective function will return true for the method "doesNoCfCheck()", no matter if it is a direct call (and such should remain like this, as the information that the to-be-called function won't perform control-flow checks is useful in other contexts). Yet, when emitting an X86ISD::NT_CALL, the respective CallInst should be verified for its indirection, allowing that the prefixed calls are only emitted in the right situations. Update the respective testing unit to also verify for direct calls to functions with ''nocf_check'' attributes. The bug can also be reproduced through compiling the following C code using the -fcf-protection=full flag. int __attribute__((nocf_check)) foo(int a) {}; int main() { foo(42); } Differential Revision: https://reviews.llvm.org/D87320	2020-10-09 15:54:23 -07:00
Craig Topper	f34bb06935	[X86] When expanding LCMPXCHG16B_NO_RBX in EmitInstrWithCustomInserter, directly copy address operands instead of going through X86AddressMode. I suspect getAddressFromInstr and addFullAddress are not handling all addresses cases properly based on a report from MaskRay. So just copy the operands directly. This should be more efficient anyway.	2020-10-09 11:55:24 -07:00
Fangrui Song	e36a41b3cf	[X86] Fix some clang-tidy bugprone-argument-comment issues	2020-10-08 15:26:50 -07:00
Craig Topper	68e1a8d207	[X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as the frame pointer. We previously checked for this during type legalization, but that's too early to know for sure if the base pointer is needed. This patch adds a new pseudo instruction to emit from isel that uses a virtual register for the RBX input. Then we use the custom inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base pointer or LCMPXCHG16B_SAVE_RBX if it is. Fixes PR42064. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D88808	2020-10-07 17:00:43 -07:00
Simon Pilgrim	6c7d713cf5	[X86][SSE] combineX86ShuffleChain add 'CanonicalizeShuffleInput' helper. NFCI. As part of PR45974, we're getting closer to not creating 'padded' vectors on-the-fly in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain. At the moment combineX86ShuffleChain just has to bitcast an input to the correct shuffle type, but eventually we'll need to pad them as well. So, move the bitcast into a 'CanonicalizeShuffleInput helper for now, making the diff for future padding support a lot smaller.	2020-10-06 17:47:24 +01:00
Craig Topper	4da4e7cb20	[X86] Remove X86ISD::LCMPXCHG8_SAVE_EBX_DAG and LCMPXCHG8B_SAVE_EBX pseudo instruction This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B. Split from D88808 Differential Revision: https://reviews.llvm.org/D88853	2020-10-05 15:03:07 -07:00
Craig Topper	1127662c6d	[SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS. getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize FP constants to the RHS. When this happens we should create the node with the FMF that was requested. By using FlagInserter when can ensure any calls to getNode/getSetcc during canonicalization will also get the flags. Differential Revision: https://reviews.llvm.org/D88063	2020-10-05 14:55:23 -07:00
Simon Pilgrim	0ac210e580	[X86] isTargetShuffleEquivalent - merge duplicate array accesses. NFCI.	2020-10-05 17:22:14 +01:00
Craig Topper	4b38ceb0eb	[X86] Remove MWAITX_SAVE_EBX pseudo instruction. Always save/restore the full %rbx register even in gnux32. ebx/rbx only needs to be saved when 64-bit registers are supported anyway. It should be fine to save/restore the whole rbx register even in gnux32 where the base is technically just ebx. This matches what we do for cmpxchg16b where rbx is saved/restored regardless of gnux32.	2020-10-04 16:28:15 -07:00
Simon Pilgrim	e4e5c42896	[X86][SSE] isTargetShuffleEquivalent - ensure shuffle inputs are the correct size. Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain. This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.	2020-10-04 15:32:05 +01:00
Craig Topper	a7e45ea30d	[X86] Add memory operand to AESENC/AESDEC Key Locker instructions. This removes FIXMEs from selectAddr.	2020-10-03 21:42:16 -07:00
Craig Topper	39fc4a0b0a	[X86] Move ENCODEKEY128/256 handling from lowering to selection. We should avoid emitting MachineSDNodes from lowering. We can use the the implicit def handling in InstrEmitter to avoid manually copying from each xmm result register. We only need to manually emit the copies for the implicit uses.	2020-10-03 18:44:53 -07:00
Craig Topper	7f3da48885	[X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel.	2020-10-03 18:44:53 -07:00
Craig Topper	adccc0bfa3	[X86] Add X86ISD opcodes for the Key Locker AESENCKL and AESDECKL instructions Instead of emitting MachineSDNodes during lowering, emit X86ISD opcodes. These opcodes will either be selected by tablegen patterns or custom selection code. Emitting MachineSDNodes during lowering is uncommon so this makes things more consistent. It also allows selectAddr to be called to perform address matching during instruction selection. I had trouble getting tablegen to accept XMM0-XMM7 as results in an isel pattern for the WIDE instructions so I had to use custom instruction selection.	2020-10-03 16:55:19 -07:00
serge-sans-paille	9573c9f2a3	Fix limit behavior of dynamic alloca When the allocation size is 0, we shouldn't probe. Within [1, PAGE_SIZE], we should probe once etc. This fixes https://bugs.llvm.org/show_bug.cgi?id=47657 Differential Revision: https://reviews.llvm.org/D88548	2020-10-02 11:10:02 +02:00
Craig Topper	d1d7fc9832	[X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare. This will be further canonicalized to a compare involving 0 which will enable the use of test instructions. Either using cmovg for signed for cmovne for unsigned. Fixes more case for PR47049	2020-09-30 13:50:52 -07:00
Xiang1 Zhang	413577a879	[X86] Support Intel Key Locker Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access to the raw key value by converting AES keys into “handles”. These handles can be used to perform the same encryption and decryption operations as the original AES keys, but they only work on the current system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot), then any previous handles can no longer be used. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D88398	2020-09-30 18:08:45 +08:00
Craig Topper	618a890b72	[X86] Increase the depth threshold required to form VPERMI2W/VPERMI2B in shuffle combining These instructions are implemented with two port 5 uops and one port 015 uop so they are more complicated that most shuffles. This patch increases the depth threshold for when we form them during shuffle combining to try to limit increasing the number of uops especially on port 5. Differential Revision: https://reviews.llvm.org/D88503	2020-09-29 18:37:23 -07:00
Craig Topper	82da0cabb9	[X86] Add computeKnownBits support for PEXT. The number of zeros in the mask provides a lower bound on the number of leading zeros in the result.	2020-09-28 22:54:07 -07:00
Craig Topper	e53196b1e8	[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask. We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits -If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it. -The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted. -The result will have zeros in any position that the mask is zero. -Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input. Differential Revision: https://reviews.llvm.org/D87883	2020-09-28 14:21:30 -07:00
Simon Pilgrim	e0820d87e3	[X86] Flip isShuffleEquivalent argument order to match isTargetShuffleEquivalent A while ago, we converted isShuffleEquivalent/isTargetShuffleEquivalent to both use IsElementEquivalent internally. This allows us to make the shuffle args optional like isTargetShuffleEquivalent and update foldShuffleOfHorizOp to use isShuffleEquivalent (which it should as its using a ISD::VECTOR_SHUFFLE mask).	2020-09-28 12:53:56 +01:00
Simon Pilgrim	6b5198f06b	[X86] Simplify broadcast mask detection with isUndefOrEqual helper. Add an additional isUndefOrEqual variant that matches an entire mask, not just a single value.	2020-09-28 12:53:56 +01:00
Simon Pilgrim	283036394e	[X86][SSE] combineVectorTruncation - enable (pre-SSSE3) vXi16->vXi8 truncation. Shuffle combining can now handle this output, and by performing this early in combineVectorTruncation we avoid a scalarization that caused a regression on D87502.	2020-09-24 15:51:36 +01:00
Craig Topper	f21f835ee8	[X86] Improve demanded bits for X86ISD::BEXTR. If the control is constant we can figure out exactly which bits of the input are demanded. Differential Revision: https://reviews.llvm.org/D88072	2020-09-23 10:51:02 -07:00
Craig Topper	a74b1faba2	[X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization. The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant. Differential Revision: https://reviews.llvm.org/D87863	2020-09-20 13:54:20 -07:00
Craig Topper	4e8c028158	[X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores. Differential Revision: https://reviews.llvm.org/D87862	2020-09-20 13:46:59 -07:00
Simon Pilgrim	0bfeede669	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X)	2020-09-20 18:39:12 +01:00
Simon Pilgrim	bb0078e591	[X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 18:39:12 +01:00
Simon Pilgrim	15c8306056	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 16:33:02 +01:00
Simon Pilgrim	a0c8793ce6	[X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets. Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.	2020-09-20 16:05:10 +01:00
Simon Pilgrim	2b634a9d0e	[X86] Rename getExtendInVec to getEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering/combining.	2020-09-20 15:19:39 +01:00
Simon Pilgrim	91720ee561	[X86] combineX86ShufflesRecursively - fix use after move warning. NFCI. After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.	2020-09-20 14:06:50 +01:00
Simon Pilgrim	e17686ae60	[X86] Rename combineExtInVec to combineEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering.	2020-09-20 12:16:00 +01:00
Craig Topper	721d57f952	[X86] Return from SimplifyDemandedBitsForTargetNode after calculating known bits for VSHLI/VSRAI/VSRLI. We were breaking out of the switch which falls into the default implementation of SimplifyDemandedBitsForTargetNode which is a wrapper around computeKnownBits. So we end up doing the recursion and known bits calculation all over again. Instead we should return with the known bits we calculated in the switch.	2020-09-18 23:57:01 -07:00
Craig Topper	58ecbbcdcd	[X86] Fix copy paste mistake in @ccnp flag. We were treating @ccp and @ccnp the same.	2020-09-18 21:28:01 -07:00
Simon Pilgrim	4ebd30722a	[X86][AVX] lowerBuildVectorAsBroadcast - improve BROADCASTM lowering on non-VLX targets Broadcast to a ZMM type then extract the low subvector.	2020-09-18 19:52:02 +01:00
Simon Pilgrim	ceadd98c2f	[X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.	2020-09-18 16:59:57 +01:00
Craig Topper	3783d3bc7b	[X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber The register class picked will be the RFP80 register class which has a f80 VT. The code in SelectionDAGBuilder that generates copies around inline assembly doesn't know how to handle an integer and floating point type of different bit widths. The test case is derived from this https://godbolt.org/z/sEa659 which gcc accepts but clang crashes on. This patch just gives a more graceful error. I'm not sure if the single element struct case is special in gcc. Adding another field to the struct makes gcc reject it. If we want to support this correctly I think we need a change in the frontend to give us the true element type. Right now the frontend just realizes the constraint can take a memory argument so creates an integer type of the same size and bitcasts. Differential Revision: https://reviews.llvm.org/D87485	2020-09-17 11:26:50 -07:00
Simon Pilgrim	b2c931eff3	[X86] EmitInstrWithCustomInserter - remove redundant getDebugLoc() calls. NFCI. Use the same DebugLoc that is called at the top of the method. Fixes some Wshadow static analyzer warnings.	2020-09-16 16:29:56 +01:00
Simon Pilgrim	aa4b0b755a	[X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle. Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup. Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.	2020-09-16 16:08:31 +01:00
Craig Topper	05134877e6	[X86] Use Align in reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore. Correct pointer info. If we offset the pointer, we also need to offset the pointer info Differential Revision: https://reviews.llvm.org/D87593	2020-09-15 11:22:02 -07:00
Simon Pilgrim	a43e68b58b	[X86][AVX] lowerShuffleWithSHUFPS - handle missed canonicalization cases. PR47534 exposes a case where calling lowerShuffleWithSHUFPS directly from a derived repeated mask (found by is128BitLaneRepeatedShuffleMask) results in us using an non-canonicalized mask. The missed canonicalization in this case is trivial - just commute the mask so we have more (swapped) LHS than RHS references so lowerShuffleWithSHUFPS can handle it.	2020-09-15 17:31:08 +01:00
Simon Pilgrim	fc446935d7	[X86] detectAVGPattern - accept non-pow2 vectors by padding. Drop the pow2 vector limitation for AVG generation by padding the vector to the next pow2, creating the PAVG nodes and then extracting the final subvector. Fixes some poor codegen that has been annoying me for years.....	2020-09-15 10:07:03 +01:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Craig Topper	758732a34e	[X86] Use ISD::PARITY directly instead of emitting CTPOP and AND from combineHorizontalPredicateResult. We have a PARITY ISD node now so might as well use it. It will get re-expanded later.	2020-09-12 20:01:17 -07:00
Craig Topper	ad3d6f993d	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
Simon Pilgrim	35dc91aee2	[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases Follow up to D86429 to handle the remaining regressions. This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered. For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better. Differential Revision: https://reviews.llvm.org/D87405	2020-09-12 13:39:33 +01:00
Simon Pilgrim	70a05ee288	[X86] Keep variables from getDataLayout/getDebugLoc calls as const reference. NFCI. These are only ever used as references in the called functions, so just pass the original reference instead of copying it.	2020-09-11 10:44:42 +01:00
Simon Pilgrim	b585fdae24	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-10 16:05:33 +01:00
Simon Pilgrim	fc49abee56	[X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle. lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.	2020-09-10 11:45:08 +01:00
Hiroshi Yamauchi	0ab6a15698	[X86] Add support for using fast short rep mov for memcpy lowering. Disabled by default behind an option. Differential Revision: https://reviews.llvm.org/D86883	2020-09-09 12:46:40 -07:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Craig Topper	da79b1eecc	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Craig Topper	01b3e16757	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Simon Pilgrim	9de0a3da6a	[X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448) We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.	2020-09-07 16:11:40 +01:00
Simon Pilgrim	9b645ebfff	[X86][AVX] Use lowerShuffleWithPERMV in shuffle combining to support non-VLX targets lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets. This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....	2020-09-07 12:50:50 +01:00
Simon Pilgrim	71dfdbe2c7	[X86] getFauxShuffleMask - handle insert_subvector(zero, sub, C) Directly use SM_SentinelZero elements if we're (widening)inserting into a zero vector.	2020-09-07 11:10:40 +01:00
Simon Pilgrim	ecac5c2808	[X86][AVX] lowerShuffleWithPERMV - adjust binary shuffle masks to account for widening on non-VLX targets rGabd33bf5eff2 enabled us to pad 128/256-bit shuffles to 512-bit on non-VLX targets, but wasn't updating binary shuffles to account for the new vector width.	2020-09-06 14:52:25 +01:00
Craig Topper	35b35a373d	[X86] Prevent shuffle combining from creating an identical X86ISD::SHUF128. This can cause an infinite loop if SimplifiedDemandedElts asks for the node to replace itself. A similar protection exists in other places in shuffle combining. Fixes ISPC https://github.com/ispc/ispc/issues/1864	2020-09-04 14:12:49 -07:00
Simon Pilgrim	740625fecd	[X86] Make lowerShuffleAsLanePermuteAndPermute use sublanes on AVX2 Extends lowerShuffleAsLanePermuteAndPermute to search for opportunities to use vpermq (64-bit cross-lane shuffle) and vpermd (32-bit cross-lane shuffle) to get elements into the correct lane, in addition to the 128-bit full-lane permutes it previously searched for. This is especially helpful in cross-lane byte shuffles, where the alternative tends to be "vpshufb both lanes separately and blend them with a vpblendvb", which is very expensive, especially on Haswell where vpblendvb uses the same execution port as all the shuffles. Addresses PR47262 Patch By: @TellowKrinkle (TellowKrinkle) Differential Revision: https://reviews.llvm.org/D86429	2020-09-04 11:41:26 +01:00
Craig Topper	0851350557	[X86] Update stale comment. NFC The optimization in ExpandIntOp_UINT_TO_FP was removed in D72728 in January 2020.	2020-09-03 16:19:10 -07:00
Simon Pilgrim	e56edb801b	[X86][SSE] Fold select(X > -1, A, B) -> select(0 > X, B, A) (PR47404) Help PBLENDVB peek through to the sign bit source of the selection mask by swapping the select condition and inputs.	2020-09-03 13:02:08 +01:00
Simon Pilgrim	888049b97a	[X86][SSE] Fold vselect(pshufb,pshufb) -> or(pshufb,pshufb) If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend. Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.	2020-09-02 16:55:00 +01:00
Martin Storsjö	4820af2bfc	[X86] Remove superfluous trailing semicolons, fixing warnings. NFC.	2020-09-02 11:43:27 +03:00
Simon Pilgrim	21d02dc595	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts. We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly. Differential Revision: https://reviews.llvm.org/D66004	2020-09-02 09:24:46 +01:00
Pierre Gousseau	cda6b09242	[X86] Make sure we do not clobber RBX with mwaitx when used as a base pointer. mwaitx uses EBX as one of its argument. Using this instruction clobbers RBX as it is defined to hold one of the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. This patch is adapted from @qcolombet patch for cmpxchg at r263325. This fixes PR43528. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D73475	2020-08-26 11:20:31 +01:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Simon Pilgrim	057bdd63a4	[X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.	2020-08-19 14:34:32 +01:00
Simon Pilgrim	9fee2bad6d	[X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI. canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.	2020-08-19 13:28:59 +01:00
Simon Pilgrim	b61cef3a92	[X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths. Thanks to @fhahn for the test case.	2020-08-19 13:00:26 +01:00
Simon Pilgrim	80a0dc59b7	[X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling. Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.	2020-08-19 11:39:27 +01:00
Simon Pilgrim	46fc9a0dfc	[X86][AVX] Fold store(extract_element(vtrunc)) to truncated store Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly. Differential Revision: https://reviews.llvm.org/D86158	2020-08-19 11:10:20 +01:00
Simon Pilgrim	11ff5176c4	[X86][AVX] lowerShuffleWithVPMOV - add non-VLX support. We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.	2020-08-18 17:51:14 +01:00
Simon Pilgrim	abd33bf5ef	[X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles. TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.	2020-08-18 15:46:02 +01:00
Simon Pilgrim	011bf4fd96	[X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles. This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.	2020-08-18 15:30:02 +01:00
Simon Pilgrim	d5621b83a5	[X86][AVX] lowerShuffleWithVTRUNC - pull out TRUNCATE/VTRUNC creation into helper code. NFCI. Prep work toward adding v16i16/v32i8 support for lowerShuffleWithVTRUNC and improving lowerShuffleWithVPMOV.	2020-08-18 14:52:42 +01:00
Simon Pilgrim	7db5124736	[X86][AVX] lowerShuffleWithVTRUNC - avoid unnecessary division in element counts. NFCI. (256 / SrcEltBits) == ((2 * EltSizeInBits * NumElts) / (EltSizeInBits * Scale)) == (2 * (NumElts / Scale)) == NumSrcElts	2020-08-18 13:48:22 +01:00
Simon Pilgrim	d2057a8015	[X86][AVX] Lower v16i8/v8i16 binary shuffles using VTRUNC/TRUNCATE This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements). We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat. For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result. This should address the last regressions in D66004 Differential Revision: https://reviews.llvm.org/D86093	2020-08-18 11:11:58 +01:00

1 2 3 4 5 ...

7474 Commits