llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	4229f12a22	[TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC. This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.	2020-02-08 08:55:51 +00:00
Nico Weber	b03c3d8c62	Revert "Support -fstack-clash-protection for x86" This reverts commit `4a1a0690ad`. Breaks tests on mac and win, see https://reviews.llvm.org/D68720	2020-02-07 14:49:38 -05:00
serge_sans_paille	4a1a0690ad	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with correct option flags set. Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 19:54:39 +01:00
serge-sans-paille	f6d98429fc	Revert "Support -fstack-clash-protection for x86" This reverts commit `39f50da2a3`. The -fstack-clash-protection is being passed to the linker too, which is not intended. Reverting and fixing that in a later commit.	2020-02-07 11:36:53 +01:00
serge_sans_paille	39f50da2a3	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 10:56:15 +01:00
Craig Topper	4175d7e22e	[X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary If we don't have cmov, X87 compares write to FPSW and we need to move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions. Previously this was done by calling ConvertCmpIfNecessary in multiple places which would emit the extra code for the FNSTSW, a shift, a truncate, and a SAHF instructions. Isel would then select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW. This patch centralizes all of the handling into a single custom isel handler. This allows us to remove ConvertCmpIfNecessary and a couple target specific ISD opcodes. Differential Revision: https://reviews.llvm.org/D73863	2020-02-06 10:43:06 -08:00
Craig Topper	016f42e3dc	[X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist. lrint/llrint are defined as rounding using the current rounding mode. Numbers that can't be converted raise FE_INVALID and an implementation defined value is returned. They may also write to errno. I believe this means we can use cvtss2si/cvtsd2si or fist to convert as long as -fno-math-errno is passed on the command line. Clang will leave them as libcalls if errno is enabled so they won't become ISD::LRINT/LLRINT in SelectionDAG. For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si but we can use fist since it can write to a 64-bit memory location. Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq targets? gcc also does this optimization. I think we might be able to do this with STRICT_LRINT/LLRINT as well, but I've left that for future work. Differential Revision: https://reviews.llvm.org/D73859	2020-02-04 16:15:40 -08:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Craig Topper	943b5561d6	[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops. This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes. It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from. This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle. Differential Revision: https://reviews.llvm.org/D73749	2020-02-01 11:21:04 -08:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Wang, Pengfei	3d1f0ce3b9	[X86] Add combination for fma and fneg on X86 under strict FP. Summary: X86 has instructions to calculate fma and fneg at the same time. But we combine the fneg and fma only when fneg is the source operand under strict FP. Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3 Subscribers: LuoYuanke, llvm-commits, cfe-commits, jdoerfert, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D72824	2020-01-28 20:09:56 +08:00
Craig Topper	5fa2022ec0	[X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together. Summary: I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added. I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72805	2020-01-18 23:44:05 -06:00
Sanjay Patel	43f60e614a	[x86] try harder to form 256-bit unpck* This is another part of a problem noted in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split into the expected 128-bit unpack instructions. We have to be selective in matching the types where we try to do this though. Otherwise, we can end up with more instructions (in the case of v8x32/v4x64). Differential Revision: https://reviews.llvm.org/D72575	2020-01-17 10:42:39 -05:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Simon Pilgrim	3db84f142a	[X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.	2020-01-05 15:24:57 +00:00
Craig Topper	6962eea2c3	[X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where STRICT nodes are collected after D71841	2020-01-02 12:29:28 -08:00
Ulrich Weigand	63336795f0	[FPEnv] Default NoFPExcept SDNodeFlag to false The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841	2020-01-02 16:59:45 +01:00
Liu, Chen3	8af492ade1	add strict float for round operation Differential Revision: https://reviews.llvm.org/D72026	2020-01-01 20:42:12 +08:00
Fangrui Song	5edb40c022	[SelectionDAG] Disallow indirect "i" constraint This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.	2019-12-29 16:50:42 -08:00
Liu, Chen3	1a7b69f5dd	add custom operation for strict fpextend/fpround Differential Revision: https://reviews.llvm.org/D71892	2019-12-27 08:28:33 +08:00
Wang, Pengfei	472bded3ed	[X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871	2019-12-26 08:15:13 +08:00
Craig Topper	a21beccea2	[X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP. Differential Revision: https://reviews.llvm.org/D71850	2019-12-24 10:07:04 -08:00
Craig Topper	bf507d4259	[X86] Make EmitCmp into a static function and explicitly return chain result for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.	2019-12-19 23:03:15 -08:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Wang, Pengfei	21bc8631fe	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86 Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582	2019-12-11 08:23:09 +08:00
Craig Topper	8267be2995	[X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.	2019-12-05 17:54:21 -08:00
Craig Topper	3d43c73f26	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.	2019-12-04 17:58:10 -08:00
Craig Topper	85589f8077	[X86] Add custom type legalization and lowering for scalar STRICT_FP_TO_SINT/UINT This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up. This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate. Still need to fix Promotion in the target independent code in LegalizeDAG. llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type. This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains. Differential Revision: https://reviews.llvm.org/D70214	2019-11-19 16:05:22 -08:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Simon Pilgrim	6716512670	[X86] scaleShuffleMask - use size_t Scale to avoid overflow warnings llvm-svn: 374674	2019-10-12 18:33:47 +00:00
Simon Pilgrim	08c2f530ec	[DAG][X86] Add isNegatibleForFree/GetNegatedExpression override placeholders. NFCI. Continuing to undo the rL372756 reversion. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 374345	2019-10-10 13:29:35 +00:00
Craig Topper	105e82edde	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349	2019-10-01 16:28:20 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Sanjay Patel	31b9dfe23f	[x86] fix assert with horizontal math + broadcast of vector (PR43402) https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606	2019-09-23 13:30:23 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Craig Topper	95aea74494	[X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069	2019-09-17 04:41:14 +00:00
Craig Topper	efe6724b9f	[DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775	2019-09-12 21:30:18 +00:00
Craig Topper	08474ca091	[X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG legalization fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128. This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work. I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there. Differential Revision: https://reviews.llvm.org/D67128 llvm-svn: 371672	2019-09-11 21:30:09 +00:00
Philip Reames	20aafa3156	Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441	2019-09-09 19:23:22 +00:00
Craig Topper	b8d6ba3ca2	[X86] Override BuildSDIVPow2 for X86. As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104	2019-09-05 18:15:07 +00:00
Reid Kleckner	3fa07dee94	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn This reverts r370525 (git commit `0bb1630685`) Also reverts r370543 (git commit `185ddc08ee`) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829	2019-09-03 22:27:27 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
Roman Lebedev	cc7495a355	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327	2019-08-29 10:50:09 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Simon Pilgrim	9da4989c52	[X86] Remove unused include. NFCI. We don't use anything from TargetOptions.h directly and its included via TargetLowering.h anyhow. llvm-svn: 369110	2019-08-16 14:05:46 +00:00
Craig Topper	2a372ba534	[X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to use movq2dq instead of going through memory. llvm-svn: 369031	2019-08-15 18:23:37 +00:00
Luo, Yuanke	c6c86f4f81	[X86] Fix stack probe issue on windows32. Summary: On windows if the frame size exceed 4096 bytes, compiler need to generate a call to _alloca_probe. X86CallFrameOptimization pass changes the reserved stack size and cause of stack probe function not be inserted. This patch fix the issue by detecting the call frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization. Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon Reviewed By: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65923 llvm-svn: 368503	2019-08-10 02:49:02 +00:00
Craig Topper	a9ed5436bd	[X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601	2019-08-01 18:49:07 +00:00
Simon Pilgrim	33f5f863b5	[X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handling This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367570	2019-08-01 14:46:03 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Craig Topper	84a1f07363	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549	2019-07-09 19:55:28 +00:00
Sanjay Patel	9126c84f50	[x86] remove stale comment about cmov; NFC The cmov node used to sometimes return a glue result (and that's what 'flag' meant in this context), but that was removed with D38664. llvm-svn: 364687	2019-06-28 21:45:55 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Pengfei Wang	1f67d94279	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053	2019-05-30 03:59:16 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Reid Kleckner	08c15df29f	[X86] Deduplicate symbol lowering logic, NFC Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952	2019-05-16 23:15:26 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Luo, Yuanke	beec41c656	Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017	2019-05-06 08:22:37 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Simon Pilgrim	0a5c2b2449	[X86] scaleShuffleMask - avoid potential signed overflow warning. Use size_t assignment to prevent a bad explicit type conversion warning. Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359479	2019-04-29 18:32:06 +00:00
Craig Topper	063b471ff7	[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368	2019-04-27 03:38:15 +00:00
Simon Pilgrim	d30745b2a0	[X86] Add shouldFoldConstantShiftPairToMask override placeholder. NFCI. Prep work toward fixing PR40758 llvm-svn: 359088	2019-04-24 12:34:08 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Craig Topper	f7e548c076	Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214	2019-04-11 19:19:42 +00:00
Craig Topper	8200880c9a	Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212	2019-04-11 19:04:38 +00:00
Craig Topper	1c2dfc3100	[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211	2019-04-11 18:40:21 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Craig Topper	051bd16faf	[X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead. These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers. This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes that take the extra zeroes as inputs. The zeros will then go through isel on their own to select the MOV32r0 pseudo. Then we just need to mention the physical registers directly in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary copies to/from physical registers. llvm-svn: 357659	2019-04-04 00:28:49 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Andrea Di Biagio	624f5deff4	[X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCI This patch removes the following dag node opcodes from namespace X86ISD: RDTSC_DAG, RDTSCP_DAG, RDPMC_DAG The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The only differences are: RDTSC/RDTSCP don't implicitly read ECX. RDTSCP also implicitly writes ECX. I moved the common expansion logic into a helper function with the goal to get rid of code repetition. That helper is now used for the expansion of RDTSC/RDTSCP/RDPMC/XGETBV intrinsics. No functional change intended. Differential Revision: https://reviews.llvm.org/D59547 llvm-svn: 356546	2019-03-20 11:21:15 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Craig Topper	f19d6a4073	[X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode. After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807	2019-03-11 04:37:01 +00:00
Craig Topper	ecbc141dbf	[X86] Split SCALEF(S) ISD opcodes into a version without rounding mode. llvm-svn: 355806	2019-03-11 04:36:59 +00:00
Craig Topper	a0b5338834	[X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current direction nodes. Remove rounding mode operand. llvm-svn: 355805	2019-03-11 04:36:57 +00:00
Craig Topper	ba7d654526	[X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to _SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804	2019-03-11 04:36:55 +00:00
Craig Topper	244ffcdf0d	[X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand. llvm-svn: 355803	2019-03-11 04:36:53 +00:00
Craig Topper	6059b1737e	[X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802	2019-03-11 04:36:51 +00:00
Craig Topper	4c544ca993	[X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of _RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801	2019-03-11 04:36:49 +00:00
Craig Topper	704303a2a1	[X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800	2019-03-11 04:36:47 +00:00
Craig Topper	b7e6bfe579	[X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799	2019-03-11 04:36:44 +00:00
Sanjay Patel	d8b4efcb6b	[CGP] form usub with overflow from sub+icmp The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298	2019-02-18 23:33:05 +00:00
Nirav Dave	7875841121	[X86] Fix LowerAsmOutputForConstraint. Summary: Update Flag when generating cc output. Fixes PR40737. Reviewers: rnk, nickdesaulniers, craig.topper, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58283 llvm-svn: 354163	2019-02-15 20:01:55 +00:00
Craig Topper	3099e442a6	[X86] Refactor the FP_TO_INTHelper interface. NFCI -Pull the final stack load creation from the two callers into the helper. -Return a single SDValue instead of a std::pair. -Remove the Replace flag which isn't really needed. llvm-svn: 353920	2019-02-13 07:42:31 +00:00
Craig Topper	7670ede434	[X86] Collapse FP_TO_INT16_IN_MEM/FP_TO_INT32_IN_MEM/FP_TO_INT64_IN_MEM into a single opcode using memory VT to distinquish. NFC llvm-svn: 353798	2019-02-12 06:14:18 +00:00
Craig Topper	d7303ecd0b	[X86] Remove the value type operand from the floating point load/store MemIntrinsicSDNodes. Use the MemoryVT instead. NFCI We already have the memory VT, we can just match from that during isel. llvm-svn: 353797	2019-02-12 06:14:16 +00:00
Nirav Dave	e5c37958f9	[InlineAsm][X86] Add backend support for X86 flag output parameters. Allow custom handling of inline assembly output parameters and add X86 flag parameter support. llvm-svn: 353307	2019-02-06 15:26:29 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Sjoerd Meijer	f7cc34cae8	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736	2019-01-31 08:07:30 +00:00
Craig Topper	4aa74fff1f	[X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps cvtuqq2ps nodes that produce less than 128-bits of results. These nodes zero the upper half of the result and can't be represented with vselect. llvm-svn: 351666	2019-01-19 21:26:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Craig Topper	59abdf5f3f	[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381	2019-01-16 21:46:32 +00:00
Craig Topper	5ea3120718	[X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv just before isel table lookup. Remove vselect isel patterns. This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition. I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV. Differential Revision: https://reviews.llvm.org/D56771 llvm-svn: 351380	2019-01-16 21:46:28 +00:00
Craig Topper	0e420e6a62	[X86] Rename SHRUNKBLEND ISD node to BLENDV. That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use. llvm-svn: 351278	2019-01-16 00:20:30 +00:00
Craig Topper	31156bbdb9	[X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018	2019-01-13 02:59:59 +00:00
Craig Topper	4561edbec0	[X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017	2019-01-13 02:59:57 +00:00
Craig Topper	90fe6edcba	[X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering. llvm-svn: 350995	2019-01-12 08:15:54 +00:00
Craig Topper	33b2cf50e3	[X86] Add ISD node for masked version of CVTPS2PH. The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994	2019-01-12 08:05:12 +00:00
Craig Topper	abe6ef8d09	[X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985	2019-01-12 00:55:27 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	a8f07e51f9	[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082	2018-12-27 01:50:40 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Simon Pilgrim	c5bb362b13	[X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes. This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345182	2018-10-24 19:11:28 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Roman Lebedev	898808504d	[X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR. Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904	2018-10-22 14:12:44 +00:00
Craig Topper	5eea94edd4	[X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG combine and target bits support. Use a post isel peephole instead. Summary: These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended. To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion. This patch instead removes all of that and adds a late peephole to detect the two extends. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53449 llvm-svn: 344874	2018-10-21 21:07:27 +00:00
Simon Pilgrim	8191d63c3b	[X86] Add initial SimplifyDemandedVectorEltsForTargetNode support This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles. Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity. Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc. NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification. Differential Revision: https://reviews.llvm.org/D52140 llvm-svn: 342564	2018-09-19 18:11:34 +00:00
Sanjay Patel	4fd2e2a498	[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554	2018-09-19 15:57:40 +00:00
Sanjay Patel	113cac3b15	[SelectionDAG][x86] turn insertelement into undef with variable index into splat I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705	2018-08-26 18:20:41 +00:00
Craig Topper	a11a3b3818	[SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689	2018-08-25 17:48:17 +00:00
Craig Topper	633fe98e27	[X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns. AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics. The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics. Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns. llvm-svn: 339749	2018-08-15 01:23:00 +00:00
Craig Topper	17989208a9	[SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes. getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather. llvm-svn: 339096	2018-08-07 06:13:40 +00:00
Fangrui Song	f78650a8de	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
Matt Arsenault	81920b0a25	DAG: Add calling convention argument to calling convention funcs This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194	2018-07-28 13:25:19 +00:00
Benjamin Kramer	64c7fa3201	Revert "[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts" This reverts commit r337547. It triggers an infinite loop. llvm-svn: 337617	2018-07-20 20:59:46 +00:00
Simon Pilgrim	6fb8b68b2d	[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts This is an early step towards using SimplifyDemandedVectorElts for target shuffle combining - this merely moves the existing X86ISD::VBROADCAST simplification code to use the SimplifyDemandedVectorElts mechanism. Adds X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode to handle X86ISD::VBROADCAST - in time we can support all target shuffles (and other ops) here. llvm-svn: 337547	2018-07-20 13:26:51 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Craig Topper	73347ec081	[X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering. We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector. llvm-svn: 336883	2018-07-12 03:42:41 +00:00
Craig Topper	dea0b88b04	[X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load. There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load. We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason. So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns. llvm-svn: 336728	2018-07-10 21:00:22 +00:00
Roman Lebedev	5ccae1750b	[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts. Summary: This adds a reverse transform for the instcombine canonicalizations that were added in D47980, D47981. As discussed later, that was worse at least for the code size, and potentially for the performance, too. https://rise4fun.com/Alive/Zmpl Reviewers: craig.topper, RKSimon, spatel Reviewed By: spatel Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D48768 llvm-svn: 336585	2018-07-09 19:06:42 +00:00
Craig Topper	c60e1807b3	[X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead. The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector. There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG. llvm-svn: 336416	2018-07-06 07:14:41 +00:00
Simon Pilgrim	aa2bf2be31	[TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>& This is more generic and matches isShuffleMaskLegal. Differential Revision: https://reviews.llvm.org/D48591 llvm-svn: 335605	2018-06-26 14:15:31 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Alexander Ivchenko	964b27fa21	[X86][CET] Shadow stack fix for setjmp/longjmp This is the new version of D46181, allowing setjmp/longjmp to work correctly with the Intel CET shadow stack by storing SSP on setjmp and fixing it on longjmp. The patch has been updated to use the cf-protection-return module flag instead of HasSHSTK, and the bug that caused D46181 to be reverted has been fixed with the test expanded to track that fix. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47311 llvm-svn: 333990	2018-06-05 09:22:30 +00:00
Matt Arsenault	ab2b79cb97	DAG: Remove redundant version of getRegisterTypeForCallingConv There seems to be no real reason to have these separate copies. The existing implementations just copy each other for x86. For Mips there is a subtle difference, which is just a bug since it changes based on the context where which one was called. Dropping this version, all tests pass. If I try to merge them to match the removed version, a test fails. llvm-svn: 333440	2018-05-29 17:42:26 +00:00
Craig Topper	dcfcfdb0d1	[X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode. These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383	2018-05-28 19:33:11 +00:00
Jessica Paquette	ec37c640dd	Revert "[X86][CET] Shadow stack fix for setjmp/longjmp" This reverts commit 30962eca38ef02666ebcdded72a94f2cd0292d68. This commit has been causing test asan failures on a build bot. http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/45108/ Original commit: https://reviews.llvm.org/D46181 llvm-svn: 331813	2018-05-08 22:00:57 +00:00
Alexander Ivchenko	c47f799289	[X86][CET] Shadow stack fix for setjmp/longjmp This patch adds a shadow stack fix when compiling setjmp/longjmp with the shadow stack enabled. This allows setjmp/longjmp to work correctly with CET. Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46181 llvm-svn: 331748	2018-05-08 09:04:07 +00:00
Roman Lebedev	cc42d08b1d	[DagCombiner] Not all 'andn''s work with immediates. Summary: Split off from D46031. In masked merge case, this degrades IPC by decreasing instruction count. {F6108777} The next patch should be able to recover and improve this. This also affects the transform @spatel have added in D27489 / rL289738, and the test coverage for X86 was missing. But after i have added it, and looked at the changes in MCA, i'm somewhat confused. {F6093591} {F6093592} {F6093593} I'd say this regression is an improvement, since `IPC` increased in that case? Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: andreadb, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D46493 llvm-svn: 331684	2018-05-07 21:52:11 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Craig Topper	d656410293	[X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block. Summary: Previously the flag intrinsics always used the index instructions even if a mask instruction also exists. To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction. Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs. I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions. Reviewers: chandlerc, RKSimon, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46202 llvm-svn: 331091	2018-04-27 22:15:33 +00:00
Gabor Buella	31fa8025ba	[X86] WaitPKG instructions Three new instructions: umonitor - Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. umwait - A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. tpause - Directs the processor to enter an implementation-dependent optimized state until the TSC reaches the value in EDX:EAX. Also modifying the description of the mfence instruction, as the rep prefix (0xF3) was allowed before, which would conflict with umonitor during disassembly. Before: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text mfence After: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text umonitor %rax Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45253 llvm-svn: 330462	2018-04-20 18:42:47 +00:00
Sriraman Tallam	d693093a65	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Chandler Carruth	19618fc639	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Oren Ben Simhon	fdd72fd522	[X86] Added support for nocf_check attribute for indirect Branch Tracking X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767	2018-03-17 13:29:46 +00:00

1 2 3 4 5 ...

1162 Commits