llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b9376690a0	[X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targets Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302	2020-01-07 11:22:03 -08:00
Simon Pilgrim	0e912e22b6	[X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.	2020-01-07 16:51:10 +00:00
Simon Pilgrim	c0365aaaa4	[X86] Standardize shuffle match/lowering function names. NFC. We mainly use lowerShuffle/matchShuffle - replace the (few) lowerVectorShuffle/matchVectorShuffle cases to be consistent.	2020-01-07 13:41:52 +00:00
Craig Topper	6a0564adcf	[X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets. Use zext+or+fsub to do the conversion. Similar to D71971. Differential Revision: https://reviews.llvm.org/D71971	2020-01-06 14:07:35 -08:00
Craig Topper	95840866b7	[X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets. Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956	2020-01-05 17:44:08 -08:00
Liu, Chen3	ca3bf289a7	[NFC] Modify the format: Drop the else since we alerady returned in the if.	2020-01-06 09:35:19 +08:00
Simon Pilgrim	e3bd011890	[X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).	2020-01-05 18:50:44 +00:00
Simon Pilgrim	6a6e6f04ec	[X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI. Updates function order in preparation of future fix for PR43660	2020-01-05 17:17:41 +00:00
Simon Pilgrim	3db84f142a	[X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.	2020-01-05 15:24:57 +00:00
Craig Topper	2875cc6b29	[X86] Improve for v2i32->v2f64 uint_to_fp This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945	2020-01-03 11:39:08 -08:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Craig Topper	bd46e29742	[X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968	2020-01-02 21:46:53 -08:00
Wang, Pengfei	60333a5317	[X86] Enable strict FP by default and remove option -disable-strictnode-mutation. NFCI.	2020-01-03 10:59:34 +08:00
Wang, Pengfei	9dc9e0ea64	[X86] Optimization of inserting vxi1 sub vector into vXi1 vector Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917	2020-01-03 09:25:25 +08:00
Craig Topper	c36763d894	[X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984	2020-01-01 11:16:52 -08:00
Liu, Chen3	8af492ade1	add strict float for round operation Differential Revision: https://reviews.llvm.org/D72026	2020-01-01 20:42:12 +08:00
Craig Topper	26bdc603f7	[X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.	2019-12-31 15:57:39 -08:00
Craig Topper	1cc8a74de3	[X86] Use carry flag from add for (seteq (add X, -1), -1). If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019	2019-12-31 15:05:23 -08:00
Craig Topper	e898ba2d15	[X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.	2019-12-31 00:16:13 -08:00
Craig Topper	47a2fd2df4	[X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode. If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.	2019-12-30 10:50:25 -08:00
Craig Topper	266cd7717c	[X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFC These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.	2019-12-29 17:35:49 -08:00
Craig Topper	b2f19320dc	[X86] Use isOneConstant to simplify some code. NFC	2019-12-29 16:53:38 -08:00
Craig Topper	599d070910	[X86] Remove dyn_casts to ConstantSDNode for operand 1 of X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.	2019-12-29 16:53:38 -08:00
Craig Topper	a5c96e326a	[X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.	2019-12-28 23:11:48 -08:00
Craig Topper	ae321faeed	[X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in LowerUINT_TO_FP_i32. NFCI	2019-12-28 21:49:22 -08:00
Craig Topper	fca4736874	[X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.	2019-12-27 00:28:44 -08:00
Craig Topper	20aab49492	[X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.	2019-12-27 00:28:44 -08:00
Fangrui Song	7a7334663c	Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821 Intrinsic has incorrect argument type! i32 (i32) @llvm.setjmp wipes tear	2019-12-27 00:00:14 -08:00
Craig Topper	ecbaf152f8	[X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879	2019-12-26 22:04:40 -08:00
Craig Topper	50fb3957c1	[X86] Custom widen strict v2f32->v2i32 by padding with zeroes. For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.	2019-12-26 21:45:18 -08:00
Fangrui Song	c4a97b64e3	[X86] Fix -Wmisleading-indentation after D71892	2019-12-26 21:41:16 -08:00
Craig Topper	53ee806d93	[X86][FPEnv] Promote some float strictfp operations to double on i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.	2019-12-26 20:22:24 -08:00
Craig Topper	a5d266b9cf	[X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32. I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.	2019-12-26 19:10:26 -08:00
Liu, Chen3	1a7b69f5dd	add custom operation for strict fpextend/fpround Differential Revision: https://reviews.llvm.org/D71892	2019-12-27 08:28:33 +08:00
Eric Christopher	1584e2f987	Remove SrcVT only used in an assert and propagate query.	2019-12-26 15:28:32 -08:00
Craig Topper	f953882113	[X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.	2019-12-26 14:46:56 -08:00
Craig Topper	90ff34e6ab	[X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.	2019-12-26 13:40:56 -08:00
Craig Topper	bb0138729b	[X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.	2019-12-26 12:42:27 -08:00
Craig Topper	c91bf72e2c	[X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since the AVX512DQ+AVX512VL code is very similar in both. NFC	2019-12-26 08:58:34 -08:00
Craig Topper	4e6b0dd681	[X86] Add custom lowering for v2i64->v2f32 strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.	2019-12-26 08:58:34 -08:00
Wang, Pengfei	472bded3ed	[X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871	2019-12-26 08:15:13 +08:00
Craig Topper	c5b4a2386b	[X86] Use zero vector to extend to 512-bits for strict_fp_to_uint v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.	2019-12-25 10:46:00 -08:00
Craig Topper	2498d88259	[X86] Merge together some common code in LowerFP_TO_INT now that we have STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC	2019-12-25 09:57:27 -08:00
Liu, Chen3	8304781cae	Add missing strict_fp_to_int Differential Revision: https://reviews.llvm.org/D71867	2019-12-25 16:10:10 +08:00
Craig Topper	c06e53119b	[X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.	2019-12-24 11:20:10 -08:00
Craig Topper	a21beccea2	[X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP. Differential Revision: https://reviews.llvm.org/D71850	2019-12-24 10:07:04 -08:00
Ulrich Weigand	0d3f782e41	[FPEnv][X86] More strict int <-> FP conversion fixes Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840	2019-12-23 21:11:45 +01:00
Sanjay Patel	8cefc37be5	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815	2019-12-23 10:11:45 -05:00
Craig Topper	de2378b4f3	[X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736	2019-12-20 11:24:45 -08:00
Craig Topper	bf507d4259	[X86] Make EmitCmp into a static function and explicitly return chain result for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.	2019-12-19 23:03:15 -08:00
Craig Topper	9b6fafa399	[X86] Directly call EmitTest in two places instead of creating a null constant and calling EmitCmp. NFCI EmitCmp will just immediately call EmitTest and discard the null constant only to have EmitTest create it again if it doesn't fold. So just skip all that and go directly to EmitTest.	2019-12-19 23:03:06 -08:00
Liu, Chen3	2f932b5729	Enable STRICT_FP_TO_SINT/UINT on X86 backend This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592	2019-12-19 14:49:13 +08:00
Wang, Pengfei	1949235d13	[X86] Add strict fma support Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604	2019-12-18 11:44:00 +08:00
Craig Topper	004fdbe041	[X86] Manually format some setOperationAction calls to line up arguments to improve readability. NFC	2019-12-17 16:11:31 -08:00
Kevin P. Neal	b1d8576b0a	This adds constrained intrinsics for the signed and unsigned conversions of integers to floating point. This includes some of Craig Topper's changes for promotion support from D71130. Differential Revision: https://reviews.llvm.org/D69275	2019-12-17 10:06:51 -05:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Craig Topper	935d41e4bd	[X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256 This is an improvement to `88dacbd436`	2019-12-10 17:29:02 -08:00
Wang, Pengfei	21bc8631fe	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86 Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582	2019-12-11 08:23:09 +08:00
Craig Topper	88dacbd436	[X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256. This reverts `3e1aee2ba7` in favor of a different approach. Scalarizing isn't great codegen, but making the type illegal was interfering with k constraint in inline assembly.	2019-12-10 15:07:55 -08:00
Liu, Chen3	bbf7860b93	add support for strict operation fpextend/fpround/fsqrt on X86 backend Differential Revision: https://reviews.llvm.org/D71184	2019-12-10 09:04:28 +08:00
Amara Emerson	84fdd9d7a5	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Reid Kleckner	c089f02898	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Craig Topper	8267be2995	[X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.	2019-12-05 17:54:21 -08:00
Liu, Chen3	3041434450	Add strict fp support for instructions fadd/fsub/fmul/fdiv Differential Revision: https://reviews.llvm.org/D68757	2019-12-06 09:44:33 +08:00
Craig Topper	3d43c73f26	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.	2019-12-04 17:58:10 -08:00
Amy Huang	9e978bb01c	Add support for lowering 32-bit/64-bit pointers Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639	2019-12-04 11:39:03 -08:00
Craig Topper	8f73a93b2d	[X86] Add support for STRICT_FP_TO_UINT/SINT from fp128.	2019-11-27 18:38:32 -08:00
Craig Topper	cfce8f2cfb	[X86] Add strict fp support for operations of X87 instructions This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857	2019-11-26 10:59:41 -08:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
Craig Topper	1b20908334	[X86] Return Op instead of SDValue() for lowering flags_read/write intrinsics Returning SDValue() means we didn't handle it and the common code should try to expand it. But its a target intrinsic so expanding won't do anything and just leave the node alone. But it will print confusing debug messages. By returning Op we tell the common code that the node is legal and shouldn't receive any further processing.	2019-11-25 23:13:30 -08:00
Craig Topper	c43b8ec735	[X86] Add support for STRICT_FP_ROUND/STRICT_FP_EXTEND from/to fp128 to/from f32/f64/f80 in 64-bit mode. These need to emit a libcall like we do for the non-strict version. 32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes. Differential Revision: https://reviews.llvm.org/D70504	2019-11-25 18:18:39 -08:00
Simon Pilgrim	5d9a259ad5	[X86][SSE] Split off generic isLaneCrossingShuffleMask helper. NFC. Avoid MVT dependency which will be needed in a future patch.	2019-11-23 12:41:03 +00:00
Craig Topper	b29e5cdb7c	[X86] Add test cases for most of the constrained fp libcalls with fp128. Add explicit setOperation actions for some to match their none strict counterparts. This isn't required, but makes the code self documenting that we didn't forget about strict fp. I've used LibCall instead of Expand since that's more explicitly what we want. Only lrint/llrint/lround/llround are missing now.	2019-11-21 18:17:59 -08:00
Craig Topper	fc4020dbbe	[X86] Mark fp128 FMA as LibCall instead of Expand. Add STRICT_FMA as well. The Expand code would fall back to LibCall, but this makes it more explicit.	2019-11-21 18:17:57 -08:00
Craig Topper	7696b99258	[LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into libcalls. Use it for fp128 on x86-64. This requires a minor hack for f32/f64 strict fadd/fsub to avoid turning those into libcalls.	2019-11-21 16:19:25 -08:00
Craig Topper	95f44cf44a	[X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to X86ISelDAGToDAG The prevents LegalizeVectorOps from scalarizing them. We'll need to remove the X86 mutation code when we add isel patterns.	2019-11-21 16:19:18 -08:00
Hiroshi Yamauchi	52e377497d	[PGO][PGSO] DAG.shouldOptForSize part. Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095	2019-11-21 14:16:00 -08:00
Craig Topper	1439059cc7	[X86] Change legalization action for f128 fadd/fsub/fmul/fdiv from Custom to LibCall. The custom code just emits a libcall, but we can do the same with generic code. The only difference is that the generic code can form tail calls where the custom code couldn't. This is responsible for the test changes. This avoids needing to modify the Custom handling for strict fp.	2019-11-21 11:44:29 -08:00
Craig Topper	27da569a7a	[X86] Fix i16->f128 sitofp to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:32 -08:00
Craig Topper	5f3bf5967b	[X86] Fix f128->i16 fptosi to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:31 -08:00
Craig Topper	7488c0a6f5	[X86] Mark vector STRICT_FP_ROUND as Legal instead of Custom. The Custom handler doesn't do anything for these nodes anyway. SelectionDAGISel won't mutate them if they are Legal or Custom. X86 has custom code for mutating them due to missing isel patterns. When the isel patterns are added Legal will be the right answer. So go ahead a change it now since that's where we'll end up.	2019-11-20 13:03:51 -08:00
Reid Kleckner	606a2bd621	[musttail] Don't forward AL on Win64 AL is only used for varargs on SysV platforms. Don't forward it on Windows. This allows control flow guard to set up an extra hidden parameter in RAX, as described in PR44049. This also has the effect of freeing up RAX for use in virtual member pointer thunks, which may also be a nice little code size improvement on Win64. Fixes PR44049 Reviewers: ajpaverd, efriedma, hans Differential Revision: https://reviews.llvm.org/D70413	2019-11-19 16:54:00 -08:00
Craig Topper	c4b41e8d1d	[LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted Differential Revision: https://reviews.llvm.org/D70220	2019-11-19 16:14:37 -08:00
Craig Topper	85589f8077	[X86] Add custom type legalization and lowering for scalar STRICT_FP_TO_SINT/UINT This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up. This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate. Still need to fix Promotion in the target independent code in LegalizeDAG. llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type. This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains. Differential Revision: https://reviews.llvm.org/D70214	2019-11-19 16:05:22 -08:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Simon Pilgrim	bbf4af3109	[X86][SSE] Remove XFormVExtractWithShuffleIntoLoad to prevent legalization infinite loops (PR43971) As detailed in PR43971/D70267, the use of XFormVExtractWithShuffleIntoLoad causes issues where we end up in infinite loops of extract(targetshuffle(vecload)) -> extract(shuffle(vecload)) -> extract(vecload) -> extract(targetshuffle(vecload)), there are just too many legalization checks at every stage that we can't guarantee that extract(shuffle(vecload)) -> scalarload can occur. At the moment we see a number of minor regressions as we don't fold extract(shuffle(vecload)) -> scalarload before legal ops, these can be addressed in future patches and extension of X86ISelLowering's combineExtractWithShuffle.	2019-11-19 11:55:44 +00:00
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Craig Topper	f7e9d81a8e	[X86] Don't set the operation action for i16 SINT_TO_FP to Promote just because SSE1 is enabled. Instead do custom promotion in the handler so that we can still allow i16 to be used with fp80. And f64 without sse2.	2019-11-13 14:07:56 -08:00
Craig Topper	787595b2e7	[X86] Fix typo in comment. NFC	2019-11-13 14:07:55 -08:00
Craig Topper	fee9067261	[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same !useSoftFloat block. Qualify all of the Promote actions for these with !useSoftFloat too. NFCI The Promote action doesn't apply until LegalizeDAG. By the time we get there, we would have already softened all the FP operations if useSoftFloat was true. So there wouldn't be any operation left to Promote.	2019-11-13 14:07:54 -08:00
Craig Topper	a4b7613a49	[X86] Remove setOperationAction for FP_TO_SINT v8i16. This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.	2019-11-12 22:45:52 -08:00
Craig Topper	3e1aee2ba7	[X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type. This avoids some nasty issues with argument passing and lowering of arbitrary v64i8 shuffles.	2019-11-12 14:56:02 -08:00
Craig Topper	0f04ffc073	[X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 type won't be split by prefer-vector-width=256 Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16. In reality this shouldn't happen because it means we have a 512-bit vector argument, but min-legal-vector-width says a value less than 512. But a 512-bit argument should have been factored into the preferred vector width.	2019-11-12 14:56:01 -08:00
Craig Topper	ff1504da6f	[X86] Update stale comment. NFC	2019-11-11 23:55:12 -08:00
Craig Topper	578f3b5dce	[X86] Remove setOperationAction lines that say to promote MVT::i1 MVT::i1 should be removed by type legalization before we reach any code that would act on the promote action. Mainly to avoid replicating this for strict FP versions of these operations.	2019-11-11 18:35:57 -08:00
Craig Topper	6c86d6efaf	[X86] Remove some else branches after checking for !useSoftFloat() that set operations to Expand. If we're using soft floats, then these operations shoudl be softened during type legalization. They'll never get to LegalizeVectorOps or LegalizeDAG so they don't need to be Expanded there.	2019-11-11 16:32:19 -08:00
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
Craig Topper	17eb12fa6d	[X86] Remove unused variable. NFC	2019-11-06 22:53:48 -08:00
Craig Topper	1c8460d6e1	[X86] Remove dead code from combineStore. Leftovers from before we switched to widening legalization. Fixes PR43919.	2019-11-06 22:24:47 -08:00
Craig Topper	641d2e5232	[X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits. The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922	2019-11-06 13:03:18 -08:00
Dávid Bolvanský	ca7f5becf9	[X86ISelLowering] Fixed typo in assert. NFCI.	2019-11-06 20:04:15 +01:00
Sanjay Patel	8e34dd941c	[x86] avoid crashing when splitting AVX stores with non-simple type (PR43916) The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.	2019-11-06 09:28:41 -05:00
Simon Pilgrim	37cdac6344	[X86] LowerAVXExtend - fix dodgy self-comparison assert. PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".	2019-11-06 12:50:29 +00:00
Benjamin Kramer	5f158d8e21	[X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMath	2019-11-05 21:28:41 +01:00
Philip Reames	027aa27d95	[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.	2019-11-05 11:24:27 -08:00
Benjamin Kramer	00e53d912d	[X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZeros The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.	2019-11-05 19:34:06 +01:00
Simon Pilgrim	9ad9d1531b	[X86] Convert ShrinkMode to scoped enum class. NFCI.	2019-11-04 15:35:20 +00:00
Simon Pilgrim	31ed36d044	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3	2019-11-04 11:37:57 +00:00
Simon Pilgrim	3f087e38a2	[X86][SSE] combineX86ShufflesRecursively - at Depth==0, only resolve KnownZero if it removes an input. This stops infinite loops where KnownUndef elements are converted to Zeroable, resulting in KnownZero elements which are then simplified (via SimplifyDemandedElts etc.) back to KnownUndef elements........ Prep fix for PR43024 which will allow rL368307 to be re-applied.	2019-11-03 21:10:47 +00:00
Simon Pilgrim	8f29e4407c	[X86][SSE] combineX86ShufflesRecursively - don't bother merging shuffles with empty roots. NFCI. This doesn't affect actual codegen, but is a minor refactor toward fixing PR43024 where we need to avoid excess changes (folding zeroables etc.) to the shuffle mask at Depth == 0.	2019-11-03 17:46:00 +00:00
Simon Pilgrim	297d96bb60	Fix uninitialized variable warning. NFCI.	2019-11-03 11:15:55 +00:00
Simon Pilgrim	254b8461ac	[X86] Move computeZeroableShuffleElements before getTargetShuffleAndZeroables. NFCI. Prep work toward merging some of the functionality.	2019-11-02 13:38:35 +00:00
Craig Topper	eeeb18cd07	[X86] Change the behavior of canWidenShuffleElements used by lowerV2X128Shuffle to match the behavior in lowerVectorShuffle with regards to zeroable elements. Previously we marked zeroable elements in a way that prevented the widening check from recognizing that it could widen. Now we only mark them zeroable if V2 is an all zeros vector. This matches what we do for widening elements in lowerVectorShuffle. Fixes PR43866.	2019-11-01 13:06:03 -07:00
Simon Pilgrim	9b0dfdf5e1	[X86][AVX] Add support for and/or scalar bool reduction with AVX512 mask registers combineBitcastvxi1 only handles bitcast->MOVMSK combines, with mask registers we use BITCAST directly.	2019-11-01 17:55:31 +00:00
Simon Pilgrim	ea27d82814	[X86] isFNEG - use switch() instead of if-else tree. NFCI. In a future patch this will avoid some checks which don't need to be done for some opcodes.	2019-11-01 17:09:04 +00:00
Simon Pilgrim	a780b94cd1	[X86][SSE] Convert computeZeroableShuffleElements to emit KnownUndef and KnownZero	2019-10-31 11:21:39 +00:00
Simon Pilgrim	f25f3d39df	[X86] Add FIXME comment to merge more of computeZeroableShuffleElements and getTargetShuffleAndZeroables	2019-10-30 18:30:01 +00:00
Simon Pilgrim	94a4a2c97f	[X86][SSE] combineX86ShuffleChain - use resolveZeroablesFromTargetShuffle helper. NFCI.	2019-10-30 18:30:01 +00:00
Simon Pilgrim	81399002ae	[X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check FSHL/FSHR support Remove hard wired legality check.	2019-10-30 11:52:22 +00:00
Simon Pilgrim	26655376fe	[X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of hardwiring to MVT::i8	2019-10-30 11:52:22 +00:00
Guillaume Chatelet	119b436da1	[Alignment] Use Align for TFI.getStackAlignment() in X86ISelLowering Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034	2019-10-30 10:35:13 +01:00
David Zarzycki	f68925d450	[X86] Make memcmp vector lowering handle arbitrary expansions Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507	2019-10-30 09:12:57 +02:00
Philip Reames	2460989eab	[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219	2019-10-29 12:46:24 -07:00
Craig Topper	772533d921	[X86] Narrow i64 compares with constant to i32 when the upper 32-bits are known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499	2019-10-29 11:38:15 -07:00
Simon Pilgrim	501cf25839	[X86] Pull out combineOrShiftToFunnelShift helper. NFCI.	2019-10-29 15:29:51 +00:00
Craig Topper	3da269a248	[X86] Add a DAG combine to turn (and (bitcast (vXi1 (concat_vectors (vYi1 setcc), undef,))), C) into (bitcast (vXi1 (concat_vectors (vYi1 setcc), zero,))) The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND. With the AND removed we might be able to use a kortest on the result. Differential Revision: https://reviews.llvm.org/D69205	2019-10-28 11:27:01 -07:00
David Zarzycki	657e4240b1	[X86] Fix 48/96 byte memcmp code gen Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert it to ISD::INSERT_SUBVECTOR. https://reviews.llvm.org/D69464	2019-10-28 08:41:45 +02:00
David Zarzycki	11c920207a	[X86] Prefer KORTEST on Knights Landing or later for memcmp() PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting: KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0 Instead of: KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0 https://reviews.llvm.org/D69157	2019-10-26 21:14:57 +03:00
Craig Topper	3dd0a896b6	[X86] Add a check for SSE2 to the top of combineReductionToHorizontal. Without this, we can create a PSADBW node that isn't legal.	2019-10-25 11:11:32 -07:00
Simon Pilgrim	a4d55a2c36	[X86] combineX86ShufflesRecursively - assert the root mask is legal. NFCI.	2019-10-23 07:33:29 -07:00
Simon Pilgrim	b446356bf3	[X86][SSE] Add OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1))) -> MOVMSK+CMP reduction combine llvm-svn: 375463	2019-10-21 22:36:31 +00:00
Simon Pilgrim	7c15c4fb17	[X86] Rename matchBitOpReduction to matchScalarReduction. NFCI. This doesn't need to be just for bitops, but the ops do need to be fully associative. llvm-svn: 375445	2019-10-21 19:19:50 +00:00
Craig Topper	e78414622d	[X86] Check Subtarget.hasSSE3() before calling shouldUseHorizontalOp and emitting X86ISD::FHADD in LowerUINT_TO_FP_i64. This was a regression from r375341. Fixes PR43729. llvm-svn: 375381	2019-10-20 23:54:19 +00:00
Simon Pilgrim	10213b9073	[X86] Pulled out helper to decode target shuffle element sentinel values to 'Zeroable' known undef/zero bits. NFCI. Renamed 'resolveTargetShuffleAndZeroables' to 'resolveTargetShuffleFromZeroables' to match. llvm-svn: 375348	2019-10-19 16:58:24 +00:00
Simon Pilgrim	b5088aa944	[X86][SSE] lowerV16I8Shuffle - tryToWidenViaDuplication - undef unpack args tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles. llvm-svn: 375342	2019-10-19 13:18:02 +00:00
Simon Pilgrim	6ada70d1b5	[X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hops We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea. Differential Revision: https://reviews.llvm.org/D69175 llvm-svn: 375341	2019-10-19 11:53:48 +00:00
Simon Pilgrim	696794b66e	[X86] combineX86ShufflesRecursively - pull out isTargetShuffleVariableMask. NFCI. llvm-svn: 375253	2019-10-18 16:39:01 +00:00
David Zarzycki	7b9fd37fa1	[X86] Emit KTEST when possible https://reviews.llvm.org/D69111 llvm-svn: 375197	2019-10-18 03:45:52 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
Simon Pilgrim	50dc09dd16	[X86] combineX86ShufflesRecursively - split the getTargetShuffleInputs call from the resolveTargetShuffleAndZeroables call. Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements. Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0 llvm-svn: 374928	2019-10-15 17:59:13 +00:00
David Zarzycki	59390efef2	[X86] Make memcmp() use PTEST if possible and also enable AVX1 llvm-svn: 374922	2019-10-15 17:40:12 +00:00
Simon Pilgrim	70778444c7	[X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. NFCI. llvm-svn: 374878	2019-10-15 11:13:51 +00:00
Craig Topper	b2661a2d15	[X86] Don't check for VBROADCAST_LOAD being a user of the source of a VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862	2019-10-15 06:10:11 +00:00
Craig Topper	f4d03213f3	[X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755	2019-10-14 06:47:56 +00:00
Simon Pilgrim	11495e5acb	[X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732	2019-10-13 19:35:35 +00:00
Craig Topper	25eb219959	[X86] Enable use of avx512 saturating truncate instructions in more cases. This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731	2019-10-13 19:07:28 +00:00
Simon Pilgrim	3efafd6c38	[X86] SimplifyMultipleUseDemandedBitsForTargetNode - use getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725	2019-10-13 17:03:11 +00:00
Simon Pilgrim	e4c58db8bc	[X86] getTargetShuffleInputs - add KnownUndef/Zero output support Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally. llvm-svn: 374724	2019-10-13 17:03:02 +00:00
Craig Topper	d50cb9ac8c	[X86] Add a one use check on the setcc to the min/max canonicalization code in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706	2019-10-13 06:48:05 +00:00
Craig Topper	bf57aa2b25	[X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack instructions with avx512. llvm-svn: 374705	2019-10-13 05:47:47 +00:00
Simon Pilgrim	6716512670	[X86] scaleShuffleMask - use size_t Scale to avoid overflow warnings llvm-svn: 374674	2019-10-12 18:33:47 +00:00
Simon Pilgrim	66417a9f03	Replace for-loop of SmallVector::push_back with SmallVector::append. NFCI. llvm-svn: 374669	2019-10-12 16:37:02 +00:00
Simon Pilgrim	37041c7d22	Fix cppcheck shadow variable name warnings. NFCI. llvm-svn: 374668	2019-10-12 16:36:52 +00:00
Simon Pilgrim	6446079add	[X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI. llvm-svn: 374667	2019-10-12 16:36:44 +00:00
Simon Pilgrim	9f0885d38d	[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. llvm-svn: 374658	2019-10-12 15:19:13 +00:00
Craig Topper	9bd542dcd5	[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. llvm-svn: 374643	2019-10-12 07:59:29 +00:00
Craig Topper	3472feb94c	[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store. We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. llvm-svn: 374615	2019-10-12 00:01:08 +00:00
Simon Pilgrim	af6c15f679	[X86][SSE] Add support for v4i8 add reduction llvm-svn: 374579	2019-10-11 17:54:15 +00:00
Simon Pilgrim	6434eac860	[X86] isFNEG - add recursion depth limit Now that its used by isNegatibleForFree we should try to avoid costly deep recursion llvm-svn: 374534	2019-10-11 11:34:18 +00:00
Craig Topper	ccc85ac855	[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a saturating truncating store. llvm-svn: 374509	2019-10-11 04:16:49 +00:00
Craig Topper	b560fd6c52	[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack instructions in more situations. If we don't have VLX we won't end up selecting a saturating truncate for 256-bit or smaller vectors so we should just use the pack lowering. llvm-svn: 374487	2019-10-11 00:38:51 +00:00
Craig Topper	4ee7f8365f	[X86] Guard against leaving a dangling node in combineTruncateWithSat. When handling the packus pattern for i32->i8 we do a two step process using a packss to i16 followed by a packus to i8. If the final i8 step is a type with less than 64-bits the packus step will return SDValue(), but the i32->i16 step might have succeeded. This leaves the nodes from the middle step dangling. Guard against this by pre-checking that the number of elements is at least 8 before doing the middle step. With that check in place this should mean the only other case the middle step itself can fail is when SSE2 is disabled. So add an early SSE2 check then just assert that neither the middle or final step ever fail. llvm-svn: 374460	2019-10-10 21:46:52 +00:00
Craig Topper	0e561437c5	[X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed inputs to be between 0 and 255 when zmm registers are disabled on SKX. If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp. Differential Revision: https://reviews.llvm.org/D68763 llvm-svn: 374431	2019-10-10 19:40:44 +00:00
Simon Pilgrim	6a38474f77	[X86] combineFMA - Convert to use isNegatibleForFree/GetNegatedExpression. Split off from D67557. llvm-svn: 374356	2019-10-10 14:14:12 +00:00
Simon Pilgrim	f096443a98	[X86] combineFMADDSUB - Convert to use isNegatibleForFree/GetNegatedExpression. Split off from D67557, fixes the compile time regression mentioned in rL372756 llvm-svn: 374351	2019-10-10 13:46:44 +00:00
Simon Pilgrim	08c2f530ec	[DAG][X86] Add isNegatibleForFree/GetNegatedExpression override placeholders. NFCI. Continuing to undo the rL372756 reversion. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 374345	2019-10-10 13:29:35 +00:00
Philip Reames	931120846e	Conservatively add volatility and atomic checks in a few places As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86. As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them. This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions. Differential Revision: https://reviews.llvm.org/D68419 llvm-svn: 374261	2019-10-09 23:43:33 +00:00
Craig Topper	be7f81ece9	[X86] Shrink zero extends of gather indices from type less than i32 to types larger than i32. Gather instructions can use i32 or i64 elements for indices. If the index is zero extended from a type smaller than i32 to i64, we can shrink the extend to just extend to i32. llvm-svn: 373982	2019-10-07 23:03:12 +00:00
Reid Kleckner	f9b67b810e	[X86] Add new calling convention that guarantees tail call optimization When the target option GuaranteedTailCallOpt is specified, calls with the fastcc calling convention will be transformed into tail calls if they are in tail position. This diff adds a new calling convention, tailcc, currently supported only on X86, which behaves the same way as fastcc, except that the GuaranteedTailCallOpt flag does not need to enabled in order to enable tail call optimization. Patch by Dwight Guth <dwight.guth@runtimeverification.com>! Reviewed By: lebedev.ri, paquette, rnk Differential Revision: https://reviews.llvm.org/D67855 llvm-svn: 373976	2019-10-07 22:28:58 +00:00
Simon Pilgrim	9c2e123043	[X86][SSE] getTargetShuffleInputs - move VT.isSimple/isVector checks inside. NFCI. Stop all the callers from having to check the value type before calling getTargetShuffleInputs. llvm-svn: 373915	2019-10-07 16:15:20 +00:00
Simon Pilgrim	b4ba3cbda0	[X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217) If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element. This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted. Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper. Fixes PR43217 Differential Revision: https://reviews.llvm.org/D68544 llvm-svn: 373871	2019-10-06 21:11:45 +00:00
Simon Pilgrim	d84cd7caa8	Fix signed/unsigned warning. NFCI llvm-svn: 373870	2019-10-06 19:54:20 +00:00
Simon Pilgrim	739c9f0b79	[X86][SSE] Remove resolveTargetShuffleInputs and use getTargetShuffleInputs directly. Move the resolveTargetShuffleInputsAndMask call to after the shuffle mask combine before the undef/zero constant fold instead. llvm-svn: 373868	2019-10-06 19:07:00 +00:00
Simon Pilgrim	42010dc810	[X86][SSE] Don't merge known undef/zero elements into target shuffle masks. Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely). This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask. llvm-svn: 373867	2019-10-06 19:06:45 +00:00
Craig Topper	570ae49d03	[X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal Summary: The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions. I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68428 llvm-svn: 373864	2019-10-06 18:43:08 +00:00
Simon Pilgrim	5c876303ec	[X86][SSE] resolveTargetShuffleInputs - call getTargetShuffleInputs instead of using setTargetShuffleZeroElements directly. NFCI. llvm-svn: 373855	2019-10-06 15:42:25 +00:00
Simon Pilgrim	2dee7e5561	[X86][AVX] combineExtractSubvector - merge duplicate variables. NFCI. llvm-svn: 373849	2019-10-06 13:25:10 +00:00
Simon Pilgrim	032dd9b086	[X86][SSE] matchVectorShuffleAsBlend - use Zeroable element mask directly. We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This allows us to remove createTargetShuffleMask. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373846	2019-10-06 12:38:38 +00:00
David Zarzycki	7653ff398d	[X86] Enable AVX512BW for memcmp() llvm-svn: 373845	2019-10-06 10:25:52 +00:00
Simon Pilgrim	8815be04ec	[X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025) As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop. This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended. Differential Revision: https://reviews.llvm.org/D68226 llvm-svn: 373834	2019-10-05 20:49:34 +00:00
Simon Pilgrim	9ecacb0d54	[X86] lowerShuffleAsLanePermuteAndRepeatedMask - variable renames. NFCI. Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality. llvm-svn: 373832	2019-10-05 16:08:30 +00:00
Craig Topper	074fa390d2	[X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC Differential Revision: https://reviews.llvm.org/D68432 llvm-svn: 373765	2019-10-04 17:53:18 +00:00
Craig Topper	185ee6ec7c	[X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. This patch recognizes the shuffle pattern we get from a v8i64->v8i8 truncate when v8i64 isn't a legal type. With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector. Diffrential Revision: https://reviews.llvm.org/D68374 llvm-svn: 373645	2019-10-03 18:34:42 +00:00
Simon Pilgrim	eb8d85e5db	[X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI. We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373641	2019-10-03 18:13:50 +00:00
Craig Topper	eb420aa379	[X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same. This improves broadcast load folding of i64 elements on 32-bit targets where i64 isn't legal. Previously we had to represent these as vXf64 vbroadcast_loads and a bitcast to vXi64. But we didn't have any isel patterns looking for that. This also allows us to remove or simplify some isel patterns that were looking for bitcasted vbroadcast_loads. llvm-svn: 373566	2019-10-03 05:30:02 +00:00
Craig Topper	74c7d6be28	[X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495	2019-10-02 17:47:09 +00:00
Craig Topper	8c19925f42	[X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408	2019-10-01 23:18:31 +00:00
Craig Topper	105e82edde	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349	2019-10-01 16:28:20 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Craig Topper	3405237f77	[X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246	2019-09-30 18:43:44 +00:00
Craig Topper	8216414fd1	[X86] Address post-commit review from code I accidentally commited in r373136. See https://reviews.llvm.org/D68167 llvm-svn: 373245	2019-09-30 18:43:27 +00:00
Craig Topper	299ebacfe9	[X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234	2019-09-30 17:14:22 +00:00
Craig Topper	1b0ea0a12e	[X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177	2019-09-30 03:14:38 +00:00
Fangrui Song	6c320b22cd	[X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r373174 llvm-svn: 373175	2019-09-30 02:06:23 +00:00
Craig Topper	1069c01924	[X86] Remove -x86-experimental-vector-widening-legalization command line flag This was added back to allow some performance regressions to be investigated. The main perf issue was fixed shortly after adding this back and no other major issues have been reported. So I think its safe to remove this again. llvm-svn: 373174	2019-09-29 23:32:37 +00:00
Craig Topper	0ac4aacea8	[X86] Enable canonicalizeBitSelect for AVX512 since we can use VPTERNLOG now. llvm-svn: 373155	2019-09-29 01:24:22 +00:00
Craig Topper	82a707e941	[X86] Stop using UpdateNodeOperands in combineGatherScatter. Create new nodes like most other DAG combines. Creating new nodes is what we usually do. Have to explicitly check that we don't update to an existing node and having to manually manage the worklist is unusual. We can probably add a helper function to reduce the duplication of having to check if we should create a gather or scatter, but I wanted to just get the simple thing done. llvm-svn: 373137	2019-09-28 01:08:46 +00:00
Craig Topper	22984ebd0e	[X86] Split combineGatherScatter into a version for generic ISD nodes and another version for X86 specific nodes. The majority of the code doesn't run on the X86 nodes today since its gated by isBeforeLegalizeOps and we don't formm X86 nodes until after that. Except for a couple special case in type legalization. But I think we would probably break those if some of the transforms fire on them. I want to remove the hardcoded operand numbers and the unusual use of UpdateNodeOperands. Being able to know which ISD opcodes are present should help with that. llvm-svn: 373136	2019-09-28 01:06:58 +00:00
Craig Topper	750bdda638	[X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask element is wider than i1, not just when AVX512 is disabled. The AVX2 intrinsics can still be used when AVX512 is enabled and those go through this path. So we should simplify them. llvm-svn: 373108	2019-09-27 18:23:55 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Craig Topper	e3c2163ffe	[X86] Use TargetConstant for condition code on X86ISD::SETCC/CMOV/BRCOND nodes. This removes the need for ConvertToTarget opcodes in the isel table. It's also consistent with the recent changes to use TargetConstant for intrinsic nodes that always take immediates. Differential Revision: https://reviews.llvm.org/D67902 llvm-svn: 372645	2019-09-23 19:48:20 +00:00
Sanjay Patel	31b9dfe23f	[x86] fix assert with horizontal math + broadcast of vector (PR43402) https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606	2019-09-23 13:30:23 +00:00
Craig Topper	5e26064c40	[X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543	2019-09-23 05:35:20 +00:00
David Zarzycki	a7a515cb77	Prefer AVX512 memcpy when applicable When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540	2019-09-23 05:00:59 +00:00
Craig Topper	da4a4707d2	[X86] Convert to Constant arguments to MMX shift by i32 intrinsics to TargetConstant during lowering. This allows us to use timm in the isel table which is more consistent with other intrinsics that take an immediate now. We can't declare the intrinsic as taking an ImmArg because we need to match non-constants to the shift by MMX register instruction which we do by mutating the intrinsic id during lowering. llvm-svn: 372537	2019-09-23 01:21:51 +00:00
Craig Topper	a533e87792	[X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535	2019-09-23 01:05:33 +00:00
Sterling Augustine	4a58936716	Fix missed case of switching getConstant to getTargetConstant. Try 2. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67850 llvm-svn: 372434	2019-09-20 22:26:55 +00:00
Nico Weber	03475adcf7	Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it." This reverts commit `52621307bc`. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ****************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED **************** Test has no run line! ****************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383	2019-09-20 12:05:29 +00:00
Craig Topper	621c93ec1f	[X86] Convert tbm_bextri_u32/tbm_bextri_u64 intrinsics TargetConstant argument to a regular Constant during lowering. We reuse an ISD opcode here that can be reached from BMI that doesn't require it to be an immediate. Our isel patterns to match the TBM immediate form require a Constant and not a TargetConstant. We were accidentally getting the Constant due to a quirk of combineBEXTR calling SimplifyDemandedBits. The call to SimplifyDemandedBits ended up constant folding the TargetConstant to a regular Constant. But we should probably instead be asserting if SimplifyDemandedBits on a TargetConstant so we shouldn't rely on this behavior. llvm-svn: 372373	2019-09-20 07:00:22 +00:00
Sterling Augustine	52621307bc	Use getTargetConstant for BLENDI, and add a test to catch it. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67785 Tighten up the test case. llvm-svn: 372366	2019-09-20 02:29:16 +00:00
Craig Topper	081cb7ef23	[X86] Remove the special isBuildVectorOfConstantSDNodes handling from LowerBUILD_VECTORvXi1. The later code that generates a constant when there are some non-const elements works basically the same and doesn't require there to be any non-const elements. llvm-svn: 372365	2019-09-20 01:49:46 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Craig Topper	c2d25ed1b3	[X86] Prevent crash in LowerBUILD_VECTORvXi1 for v64i1 vectors on 32-bit targets when the vector is a mix of constants and non-constant. We need to materialize the constants as two 32-bit values that are casted to v32i1 and then concatenated. llvm-svn: 372304	2019-09-19 06:50:39 +00:00
Craig Topper	d103bb654f	[X86] Change a SmallVector& argument to SmallVectorImpl&. NFC Avoids repeating the size. llvm-svn: 372302	2019-09-19 06:27:12 +00:00
Craig Topper	eff4fd6999	[X86] Remove unused argument from a helper function. NFC llvm-svn: 372301	2019-09-19 06:27:07 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Guillaume Chatelet	35b4b403b4	[Alignment][NFC] Use Align::None instead of 1 Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67704 llvm-svn: 372230	2019-09-18 15:40:20 +00:00
Craig Topper	93e1f73b6b	[X86] Break non-power of 2 vXi1 vectors into scalars for argument passing with avx512. This generates worse code, but matches what is done for avx2 and prevents crashes when more arguments are passed than we have registers for. llvm-svn: 372200	2019-09-18 06:06:11 +00:00
Craig Topper	4a07336a88	[X86] Prevent assertion when calling a function that returns double with -mno-sse2 on x86-64. As seen in the most recent updates to PR10498 llvm-svn: 372197	2019-09-18 01:57:46 +00:00
Craig Topper	c198ffd8c3	[X86] Use APInt::operator<<= and APInt::lshrInPlace. NFC llvm-svn: 372159	2019-09-17 18:19:06 +00:00
Craig Topper	f9a89b6788	[X86] Simplify b2b KSHIFTL+KSHIFTR using demanded elts. llvm-svn: 372155	2019-09-17 18:02:56 +00:00
Craig Topper	f1ba94ade0	[X86] Call SimplifyDemandedVectorElts on KSHIFTL/KSHIFTR nodes during DAG combine. llvm-svn: 372154	2019-09-17 18:02:52 +00:00
Craig Topper	b50894b9c3	[X86] Simplify some code in LowerBUILD_VECTORvXi1. NFCI The case were Immediate is 0 and HasConstElts is true should never happen since that would mean the constant elts were all zero. But we check for all zero build vector earlier. So just use HasConstElts and blindly take Immediate without checking if its 0. Move the code that bitcasts and extract the immediate into the the HasConstElts case since the other code just creates an undef with the right type. No casting needed. llvm-svn: 372153	2019-09-17 18:02:46 +00:00
Simon Pilgrim	0b10da7cc7	[X86] Use APInt::getLowBitsSet helper. NFCI. Also avoids a static analyzer warning about out of range shifts. llvm-svn: 372103	2019-09-17 10:51:30 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Craig Topper	95aea74494	[X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069	2019-09-17 04:41:14 +00:00
Simon Pilgrim	3df0daddfd	[X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector. llvm-svn: 372013	2019-09-16 17:30:33 +00:00
Craig Topper	8e0f104916	[X86] Use incDecVectorConstant to simplify the min/max code in LowerVSETCC. incDecVectorConstant is used for a similar reason in LowerVSETCCWithSUBUS so we might as well share the code. llvm-svn: 371861	2019-09-13 14:59:08 +00:00
Simon Pilgrim	930ebc15a6	[X86] negateFMAOpcode - extend to support FMADDSUB/FMSUBADD and output negation. NFCI. Some prep work for PR42863, this change allows us to move all the FMA opcode mappings into the negateFMAOpcode helper. For the FMADDSUB/FMSUBADD cases, we can only negate the accumulator - any other negations will result in an error. llvm-svn: 371840	2019-09-13 11:22:40 +00:00
Craig Topper	efe6724b9f	[DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775	2019-09-12 21:30:18 +00:00
Simon Pilgrim	d67661ee24	[X86] Move negateFMAOpcode helper earlier to help future patch. NFCI. llvm-svn: 371770	2019-09-12 20:39:56 +00:00
Reid Kleckner	ff45955fc8	[X86] Fix latent bugs in 32-bit CMPXCHG8B inserter I found three issues: 1. the loop over E[ABCD]X copies run over BB start 2. the direct address of cmpxchg8b could be a frame index 3. the displacement of cmpxchg8b could be a global instead of an immediate These were all introduced together in r287875, and should be fixed with this change. Issue reported by Zachary Turner. llvm-svn: 371678	2019-09-11 21:56:17 +00:00
Craig Topper	08474ca091	[X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG legalization fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128. This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work. I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there. Differential Revision: https://reviews.llvm.org/D67128 llvm-svn: 371672	2019-09-11 21:30:09 +00:00
Philip Reames	a9beacbac8	[X86] Updated target specific selection dag code to conservatively check for isAtomic in addition to isVolatile See D66309 for context. This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific. Differential Revision: https://reviews.llvm.org/D66322 llvm-svn: 371547	2019-09-10 18:43:15 +00:00
Philip Reames	20aafa3156	Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441	2019-09-09 19:23:22 +00:00
Craig Topper	5ebd0a6e88	[SelectionDAG] Remove ISD::FP_ROUND_INREG I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431	2019-09-09 17:54:44 +00:00
Craig Topper	ce2cb0f09e	[X86] Allow _MM_FROUND_CUR_DIRECTION and _MM_FROUND_NO_EXC to be used together on instructions that only support SAE and not embedded rounding. Current for SAE instructions we only allow _MM_FROUND_CUR_DIRECTION(bit 2) or _MM_FROUND_NO_EXC(bit 3) to be used as the immediate passed to the inrinsics. But these instructions don't perform rounding so _MM_FROUND_CUR_DIRECTION is just sort of a default placeholder when you don't want to suppress exceptions. Using _MM_FROUND_NO_EXC by itself is really bit equivalent to (_MM_FROUND_NO_EXC \| _MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. Since we aren't rounding on these instructions we should also accept (_MM_FROUND_CUR_DIRECTION \| _MM_FROUND_NO_EXC) as equivalent to (_MM_FROUND_NO_EXC). icc allows this, but gcc does not. Differential Revision: https://reviews.llvm.org/D67289 llvm-svn: 371430	2019-09-09 17:48:05 +00:00
Craig Topper	72624b0e59	[X86] Use xorps to create fp128 +0.0 constants. This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357	2019-09-09 01:35:00 +00:00
Simon Pilgrim	e0ea746215	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support. This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded. llvm-svn: 371353	2019-09-08 21:38:33 +00:00
Craig Topper	77dd86ee4a	[X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with two zero/undef vector inputs into an all zeroes vector. If the two zero vectors have undefs in different places they won't get combined by simplifySelect. This fixes a regression from an earlier commit. llvm-svn: 371351	2019-09-08 20:56:09 +00:00
Craig Topper	9c11901256	[X86] Remove call to getZeroVector from materializeVectorConstant. Add isel patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350	2019-09-08 20:56:05 +00:00
Craig Topper	97d41b8917	[X86] Use DAG.getConstant instead of getZeroVector in combinePMULDQ. getZeroVector canonicalizes the type to vXi32, but that's a legalization action. We should use the most correct type if possible. llvm-svn: 371345	2019-09-08 19:24:42 +00:00
Craig Topper	30837abd96	[X86] Teach materializeVectorConstant to not call getZeroVector/getOnesVector on the types we already have isel patterns for. llvm-svn: 371343	2019-09-08 19:24:29 +00:00
Simon Pilgrim	178cd2cd3a	[X86][SSE] Fix out of range shift introduced in D67070/rL371328 Use APInt to create the comparison mask instead. llvm-svn: 371330	2019-09-08 12:44:22 +00:00
Simon Pilgrim	3262084384	[X86][SSE] Add support for <64 x i1> bool reduction This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises. We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal. Differential Revision: https://reviews.llvm.org/D67070 llvm-svn: 371328	2019-09-08 11:46:21 +00:00
Craig Topper	37dd59298f	[X86] Make getZeroVector return floating point vectors in their native type on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325	2019-09-08 00:43:52 +00:00
Simon Pilgrim	08692e5dd1	[X86] Avoid uses of getZextValue(). NFCI. Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315	2019-09-07 16:13:57 +00:00
Nikita Popov	314893cc4b	[X86] Fix pshuflw formation from repeated shuffle mask (PR43230) Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307	2019-09-07 12:13:44 +00:00
Simon Pilgrim	d7d8bb937a	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI. llvm-svn: 371302	2019-09-07 11:04:04 +00:00
Guillaume Chatelet	ad1cea0dda	[Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212	2019-09-06 15:03:49 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Craig Topper	0fde412140	[X86] Enable BuildSDIVPow2 for i16. We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107	2019-09-05 18:49:52 +00:00
Craig Topper	b8d6ba3ca2	[X86] Override BuildSDIVPow2 for X86. As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104	2019-09-05 18:15:07 +00:00
Sanjay Patel	10412a69f9	[x86] fix horizontal math bug exposed by improved demanded elements analysis (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095	2019-09-05 17:28:17 +00:00
Craig Topper	a5508163ad	[X86] Fix stale comment. NFC We aren't checking for a concat here. We're just always splitting 256-bit stores. llvm-svn: 371092	2019-09-05 17:24:15 +00:00
Simon Pilgrim	29361c704d	[X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078	2019-09-05 15:07:07 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Reid Kleckner	3fa07dee94	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn This reverts r370525 (git commit `0bb1630685`) Also reverts r370543 (git commit `185ddc08ee`) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829	2019-09-03 22:27:27 +00:00
Simon Pilgrim	99525bbe49	[X86] Merge 2 consecutive HasInt256 branches. NFCI. llvm-svn: 370761	2019-09-03 14:39:06 +00:00
Craig Topper	b915109043	[X86] Simplify the setOperationAction handling for fp_to_uint by improving the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700	2019-09-03 05:57:22 +00:00
Craig Topper	9dc8c448ed	[X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target. Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699	2019-09-03 05:57:18 +00:00
Craig Topper	dcecc7ea46	[X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets. Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693	2019-09-03 02:51:10 +00:00
Craig Topper	45cd185109	[X86] Enable fp128 as a legal type with SSE1 rather than with MMX. FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682	2019-09-02 20:16:30 +00:00
Simon Pilgrim	fb5661a884	[X86] getPMOVMSKB - add MVT::v64i8 handling and remove from combineBitcastvxi1. NFCI. llvm-svn: 370670	2019-09-02 15:10:35 +00:00
Simon Pilgrim	05a3a92751	[X86] combineHorizontalPredicateResult - pull out repeated getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637	2019-09-02 10:42:48 +00:00
Simon Pilgrim	07de5292e5	[X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI. Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613	2019-09-01 16:04:28 +00:00
Simon Pilgrim	27cc2efaf2	Fix shadow variable warning. NFCI. llvm-svn: 370610	2019-09-01 13:10:18 +00:00
Simon Pilgrim	f8d1d00190	[X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592	2019-08-31 16:21:31 +00:00
Simon Pilgrim	cffbec63d6	Fix shadow variable warning by making CondCodes names more explicit. NFCI. llvm-svn: 370589	2019-08-31 15:19:59 +00:00
Simon Pilgrim	ad020c0af1	Fix shadow variable warning. NFCI. llvm-svn: 370585	2019-08-31 15:01:03 +00:00
Simon Pilgrim	2d89007f61	[X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI. Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe). llvm-svn: 370583	2019-08-31 14:18:26 +00:00
Simon Pilgrim	7238353da2	[X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI. VT of SELECT result and selection ops will be the same. llvm-svn: 370581	2019-08-31 13:14:52 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Simon Pilgrim	3d705a1fa4	[X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159) ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404	2019-08-29 20:22:08 +00:00
Roman Lebedev	cc7495a355	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327	2019-08-29 10:50:09 +00:00
Craig Topper	1ec5c204b8	[X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294	2019-08-29 05:48:48 +00:00
Craig Topper	1aadf6f39f	[X86] Make inline assembly 'x' and 'v' constraints work for f128. Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293	2019-08-29 05:13:56 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Simon Pilgrim	8912e2af39	[X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTR Differential Revision: https://reviews.llvm.org/D66527 llvm-svn: 370055	2019-08-27 13:13:17 +00:00
Craig Topper	6db7f492d9	[X86] Delay combineIncDecVector until after op legalization. Probably better to keep add over sub in early DAG combines. It might make sense to push this to lowering or delay it all the way to isel. But this was the simplest change. llvm-svn: 369981	2019-08-26 22:17:54 +00:00
Craig Topper	36d1588f01	[X86] Add a hack to combinePMULDQ to manually turn SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1. This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit. Differential Revision: https://reviews.llvm.org/D66436 llvm-svn: 369942	2019-08-26 18:23:26 +00:00
Craig Topper	b8b90ac1c5	[X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872	2019-08-25 17:59:49 +00:00
Craig Topper	dd2cf78381	[X86] Add an assert to mark more code that needs to be removed when the vector widening legalization switch is removed again. llvm-svn: 369837	2019-08-24 05:59:46 +00:00
Craig Topper	bc173d4c51	[X86] Move a transform out of combineConcatVectorOps so we don't prematurely turn CONCAT_VECTORS into INSERT_SUBVECTORS. CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps, but we shouldn't produce INSERT_SUBVECTORS from there. We should keep CONCAT_VECTORS until vector legalization. Noticed while looking at the madd_quad_reduction test from madd.ll llvm-svn: 369802	2019-08-23 19:52:24 +00:00
Craig Topper	e7211bb567	[SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780	2019-08-23 17:14:58 +00:00
Simon Pilgrim	c88408cf85	Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI. llvm-svn: 369751	2019-08-23 12:37:09 +00:00
Craig Topper	4deb388bca	[X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of INSERT_SUBVECTORS for widening with zeros. CONCAT_VECTORS is more canonical for the early DAG combine runs until we start getting into the op legalization phases. llvm-svn: 369734	2019-08-23 06:08:33 +00:00
Craig Topper	bdceb9fb14	[X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern. For v2i32 we only feed 2 i8 elements into the psadbw instructions with 0s in the other 14 bytes. The resulting psadbw instruction will produce zeros in bits [127:16] of the output. We need to take the result and feed it to a v2i32 add where the first element includes bits [15:0] of the sad result. The other element should be zero. Prior to this patch we were using a truncate to take 0 from bits 95:64 of the psadbw. This results in a pshufd to move those bits to 63:32. But since we also have zeroes in bits 63:32 of the psadbw output, we should just take those bits. The previous code probably worked better with promoting legalization, but now we use widening legalization. I've preserved the old behavior if -x86-experimental-vector-widening-legalization=false until we get that option removed. llvm-svn: 369733	2019-08-23 05:33:27 +00:00
Simon Pilgrim	6dd51c2f19	[MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI. Allows for some cleanup in a lot of SSE/AVX vector splitting code llvm-svn: 369640	2019-08-22 11:14:30 +00:00
Craig Topper	ba375263e8	[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459	2019-08-20 22:12:50 +00:00
Craig Topper	3a2b08e6c9	[X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))). Differential Revision: https://reviews.llvm.org/D66489 llvm-svn: 369434	2019-08-20 20:20:04 +00:00
Craig Topper	22ac9f396f	[X86] Use isNullConstant instead of getConstantOperandVal == 0. NFC llvm-svn: 369410	2019-08-20 16:55:12 +00:00
Craig Topper	1ada137854	[X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332	2019-08-20 06:58:00 +00:00

... 4 5 6 7 8 ...

7035 Commits