llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	d692f0f6c8	[X86] Don't call LowerSETCC from LowerSELECT for STRICT_FSETCC/STRICT_FSETCCS nodes. This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered early while lowering SELECT, but the output chain doesn't get connected. Then we visit the node again when it is its turn because we haven't replaced the use of the chain result. In the case of the fp128 libcall lowering, after D72341 this will cause the libcall to be emitted twice.	2020-01-11 20:43:00 -08:00
Craig Topper	ed679804d5	[TargetLowering][X86] Connect the chain from STRICT_FSETCC in TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.	2020-01-11 17:50:20 -08:00
Simon Pilgrim	24763734e7	[X86] Fix outdated comment The generic saturated math opcodes are no longer widened inside X86TargetLowering	2020-01-11 14:37:18 +00:00
Simon Pilgrim	ce35010d78	[X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP lowering Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed. This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case. Helps with poor shuffles mentioned in D66004.	2020-01-11 12:42:00 +00:00
Simon Pilgrim	a5bdada09d	[X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.	2020-01-10 17:21:20 +00:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Craig Topper	3811417f39	[X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in 64-bit mode For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier. When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd. Differential Revision: https://reviews.llvm.org/D72368	2020-01-08 10:06:01 -08:00
Wang, Pengfei	9a621de1ec	[X86] Adding fp128 support for strict fcmp Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897	2020-01-08 12:59:31 +08:00
Craig Topper	9685cf709f	[X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 target Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354	2020-01-07 13:25:29 -08:00
Craig Topper	afa8211e97	[X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318	2020-01-07 11:22:04 -08:00
Craig Topper	b9376690a0	[X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targets Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302	2020-01-07 11:22:03 -08:00
Simon Pilgrim	0e912e22b6	[X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.	2020-01-07 16:51:10 +00:00
Simon Pilgrim	c0365aaaa4	[X86] Standardize shuffle match/lowering function names. NFC. We mainly use lowerShuffle/matchShuffle - replace the (few) lowerVectorShuffle/matchVectorShuffle cases to be consistent.	2020-01-07 13:41:52 +00:00
Craig Topper	6a0564adcf	[X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets. Use zext+or+fsub to do the conversion. Similar to D71971. Differential Revision: https://reviews.llvm.org/D71971	2020-01-06 14:07:35 -08:00
Craig Topper	95840866b7	[X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets. Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956	2020-01-05 17:44:08 -08:00
Liu, Chen3	ca3bf289a7	[NFC] Modify the format: Drop the else since we alerady returned in the if.	2020-01-06 09:35:19 +08:00
Simon Pilgrim	e3bd011890	[X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).	2020-01-05 18:50:44 +00:00
Simon Pilgrim	6a6e6f04ec	[X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI. Updates function order in preparation of future fix for PR43660	2020-01-05 17:17:41 +00:00
Simon Pilgrim	3db84f142a	[X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.	2020-01-05 15:24:57 +00:00
Craig Topper	2875cc6b29	[X86] Improve for v2i32->v2f64 uint_to_fp This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945	2020-01-03 11:39:08 -08:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Craig Topper	bd46e29742	[X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968	2020-01-02 21:46:53 -08:00
Wang, Pengfei	60333a5317	[X86] Enable strict FP by default and remove option -disable-strictnode-mutation. NFCI.	2020-01-03 10:59:34 +08:00
Wang, Pengfei	9dc9e0ea64	[X86] Optimization of inserting vxi1 sub vector into vXi1 vector Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917	2020-01-03 09:25:25 +08:00
Craig Topper	c36763d894	[X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984	2020-01-01 11:16:52 -08:00
Liu, Chen3	8af492ade1	add strict float for round operation Differential Revision: https://reviews.llvm.org/D72026	2020-01-01 20:42:12 +08:00
Craig Topper	26bdc603f7	[X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.	2019-12-31 15:57:39 -08:00
Craig Topper	1cc8a74de3	[X86] Use carry flag from add for (seteq (add X, -1), -1). If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019	2019-12-31 15:05:23 -08:00
Craig Topper	e898ba2d15	[X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.	2019-12-31 00:16:13 -08:00
Craig Topper	47a2fd2df4	[X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode. If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.	2019-12-30 10:50:25 -08:00
Craig Topper	266cd7717c	[X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFC These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.	2019-12-29 17:35:49 -08:00
Craig Topper	b2f19320dc	[X86] Use isOneConstant to simplify some code. NFC	2019-12-29 16:53:38 -08:00
Craig Topper	599d070910	[X86] Remove dyn_casts to ConstantSDNode for operand 1 of X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.	2019-12-29 16:53:38 -08:00
Craig Topper	a5c96e326a	[X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.	2019-12-28 23:11:48 -08:00
Craig Topper	ae321faeed	[X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in LowerUINT_TO_FP_i32. NFCI	2019-12-28 21:49:22 -08:00
Craig Topper	fca4736874	[X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.	2019-12-27 00:28:44 -08:00
Craig Topper	20aab49492	[X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.	2019-12-27 00:28:44 -08:00
Fangrui Song	7a7334663c	Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821 Intrinsic has incorrect argument type! i32 (i32) @llvm.setjmp wipes tear	2019-12-27 00:00:14 -08:00
Craig Topper	ecbaf152f8	[X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879	2019-12-26 22:04:40 -08:00
Craig Topper	50fb3957c1	[X86] Custom widen strict v2f32->v2i32 by padding with zeroes. For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.	2019-12-26 21:45:18 -08:00
Fangrui Song	c4a97b64e3	[X86] Fix -Wmisleading-indentation after D71892	2019-12-26 21:41:16 -08:00
Craig Topper	53ee806d93	[X86][FPEnv] Promote some float strictfp operations to double on i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.	2019-12-26 20:22:24 -08:00
Craig Topper	a5d266b9cf	[X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32. I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.	2019-12-26 19:10:26 -08:00
Liu, Chen3	1a7b69f5dd	add custom operation for strict fpextend/fpround Differential Revision: https://reviews.llvm.org/D71892	2019-12-27 08:28:33 +08:00
Eric Christopher	1584e2f987	Remove SrcVT only used in an assert and propagate query.	2019-12-26 15:28:32 -08:00
Craig Topper	f953882113	[X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.	2019-12-26 14:46:56 -08:00
Craig Topper	90ff34e6ab	[X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.	2019-12-26 13:40:56 -08:00
Craig Topper	bb0138729b	[X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.	2019-12-26 12:42:27 -08:00
Craig Topper	c91bf72e2c	[X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since the AVX512DQ+AVX512VL code is very similar in both. NFC	2019-12-26 08:58:34 -08:00
Craig Topper	4e6b0dd681	[X86] Add custom lowering for v2i64->v2f32 strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.	2019-12-26 08:58:34 -08:00
Wang, Pengfei	472bded3ed	[X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871	2019-12-26 08:15:13 +08:00
Craig Topper	c5b4a2386b	[X86] Use zero vector to extend to 512-bits for strict_fp_to_uint v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.	2019-12-25 10:46:00 -08:00
Craig Topper	2498d88259	[X86] Merge together some common code in LowerFP_TO_INT now that we have STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC	2019-12-25 09:57:27 -08:00
Liu, Chen3	8304781cae	Add missing strict_fp_to_int Differential Revision: https://reviews.llvm.org/D71867	2019-12-25 16:10:10 +08:00
Craig Topper	c06e53119b	[X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.	2019-12-24 11:20:10 -08:00
Craig Topper	a21beccea2	[X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP. Differential Revision: https://reviews.llvm.org/D71850	2019-12-24 10:07:04 -08:00
Ulrich Weigand	0d3f782e41	[FPEnv][X86] More strict int <-> FP conversion fixes Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840	2019-12-23 21:11:45 +01:00
Sanjay Patel	8cefc37be5	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815	2019-12-23 10:11:45 -05:00
Craig Topper	de2378b4f3	[X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736	2019-12-20 11:24:45 -08:00
Craig Topper	bf507d4259	[X86] Make EmitCmp into a static function and explicitly return chain result for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.	2019-12-19 23:03:15 -08:00
Craig Topper	9b6fafa399	[X86] Directly call EmitTest in two places instead of creating a null constant and calling EmitCmp. NFCI EmitCmp will just immediately call EmitTest and discard the null constant only to have EmitTest create it again if it doesn't fold. So just skip all that and go directly to EmitTest.	2019-12-19 23:03:06 -08:00
Liu, Chen3	2f932b5729	Enable STRICT_FP_TO_SINT/UINT on X86 backend This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592	2019-12-19 14:49:13 +08:00
Wang, Pengfei	1949235d13	[X86] Add strict fma support Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604	2019-12-18 11:44:00 +08:00
Craig Topper	004fdbe041	[X86] Manually format some setOperationAction calls to line up arguments to improve readability. NFC	2019-12-17 16:11:31 -08:00
Kevin P. Neal	b1d8576b0a	This adds constrained intrinsics for the signed and unsigned conversions of integers to floating point. This includes some of Craig Topper's changes for promotion support from D71130. Differential Revision: https://reviews.llvm.org/D69275	2019-12-17 10:06:51 -05:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Craig Topper	935d41e4bd	[X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256 This is an improvement to `88dacbd436`	2019-12-10 17:29:02 -08:00
Wang, Pengfei	21bc8631fe	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86 Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582	2019-12-11 08:23:09 +08:00
Craig Topper	88dacbd436	[X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256. This reverts `3e1aee2ba7` in favor of a different approach. Scalarizing isn't great codegen, but making the type illegal was interfering with k constraint in inline assembly.	2019-12-10 15:07:55 -08:00
Liu, Chen3	bbf7860b93	add support for strict operation fpextend/fpround/fsqrt on X86 backend Differential Revision: https://reviews.llvm.org/D71184	2019-12-10 09:04:28 +08:00
Amara Emerson	84fdd9d7a5	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Reid Kleckner	c089f02898	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Craig Topper	8267be2995	[X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.	2019-12-05 17:54:21 -08:00
Liu, Chen3	3041434450	Add strict fp support for instructions fadd/fsub/fmul/fdiv Differential Revision: https://reviews.llvm.org/D68757	2019-12-06 09:44:33 +08:00
Craig Topper	3d43c73f26	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.	2019-12-04 17:58:10 -08:00
Amy Huang	9e978bb01c	Add support for lowering 32-bit/64-bit pointers Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639	2019-12-04 11:39:03 -08:00
Craig Topper	8f73a93b2d	[X86] Add support for STRICT_FP_TO_UINT/SINT from fp128.	2019-11-27 18:38:32 -08:00
Craig Topper	cfce8f2cfb	[X86] Add strict fp support for operations of X87 instructions This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857	2019-11-26 10:59:41 -08:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
Craig Topper	1b20908334	[X86] Return Op instead of SDValue() for lowering flags_read/write intrinsics Returning SDValue() means we didn't handle it and the common code should try to expand it. But its a target intrinsic so expanding won't do anything and just leave the node alone. But it will print confusing debug messages. By returning Op we tell the common code that the node is legal and shouldn't receive any further processing.	2019-11-25 23:13:30 -08:00
Craig Topper	c43b8ec735	[X86] Add support for STRICT_FP_ROUND/STRICT_FP_EXTEND from/to fp128 to/from f32/f64/f80 in 64-bit mode. These need to emit a libcall like we do for the non-strict version. 32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes. Differential Revision: https://reviews.llvm.org/D70504	2019-11-25 18:18:39 -08:00
Simon Pilgrim	5d9a259ad5	[X86][SSE] Split off generic isLaneCrossingShuffleMask helper. NFC. Avoid MVT dependency which will be needed in a future patch.	2019-11-23 12:41:03 +00:00
Craig Topper	b29e5cdb7c	[X86] Add test cases for most of the constrained fp libcalls with fp128. Add explicit setOperation actions for some to match their none strict counterparts. This isn't required, but makes the code self documenting that we didn't forget about strict fp. I've used LibCall instead of Expand since that's more explicitly what we want. Only lrint/llrint/lround/llround are missing now.	2019-11-21 18:17:59 -08:00
Craig Topper	fc4020dbbe	[X86] Mark fp128 FMA as LibCall instead of Expand. Add STRICT_FMA as well. The Expand code would fall back to LibCall, but this makes it more explicit.	2019-11-21 18:17:57 -08:00
Craig Topper	7696b99258	[LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into libcalls. Use it for fp128 on x86-64. This requires a minor hack for f32/f64 strict fadd/fsub to avoid turning those into libcalls.	2019-11-21 16:19:25 -08:00
Craig Topper	95f44cf44a	[X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to X86ISelDAGToDAG The prevents LegalizeVectorOps from scalarizing them. We'll need to remove the X86 mutation code when we add isel patterns.	2019-11-21 16:19:18 -08:00
Hiroshi Yamauchi	52e377497d	[PGO][PGSO] DAG.shouldOptForSize part. Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095	2019-11-21 14:16:00 -08:00
Craig Topper	1439059cc7	[X86] Change legalization action for f128 fadd/fsub/fmul/fdiv from Custom to LibCall. The custom code just emits a libcall, but we can do the same with generic code. The only difference is that the generic code can form tail calls where the custom code couldn't. This is responsible for the test changes. This avoids needing to modify the Custom handling for strict fp.	2019-11-21 11:44:29 -08:00
Craig Topper	27da569a7a	[X86] Fix i16->f128 sitofp to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:32 -08:00
Craig Topper	5f3bf5967b	[X86] Fix f128->i16 fptosi to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:31 -08:00
Craig Topper	7488c0a6f5	[X86] Mark vector STRICT_FP_ROUND as Legal instead of Custom. The Custom handler doesn't do anything for these nodes anyway. SelectionDAGISel won't mutate them if they are Legal or Custom. X86 has custom code for mutating them due to missing isel patterns. When the isel patterns are added Legal will be the right answer. So go ahead a change it now since that's where we'll end up.	2019-11-20 13:03:51 -08:00
Reid Kleckner	606a2bd621	[musttail] Don't forward AL on Win64 AL is only used for varargs on SysV platforms. Don't forward it on Windows. This allows control flow guard to set up an extra hidden parameter in RAX, as described in PR44049. This also has the effect of freeing up RAX for use in virtual member pointer thunks, which may also be a nice little code size improvement on Win64. Fixes PR44049 Reviewers: ajpaverd, efriedma, hans Differential Revision: https://reviews.llvm.org/D70413	2019-11-19 16:54:00 -08:00
Craig Topper	c4b41e8d1d	[LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted Differential Revision: https://reviews.llvm.org/D70220	2019-11-19 16:14:37 -08:00
Craig Topper	85589f8077	[X86] Add custom type legalization and lowering for scalar STRICT_FP_TO_SINT/UINT This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up. This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate. Still need to fix Promotion in the target independent code in LegalizeDAG. llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type. This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains. Differential Revision: https://reviews.llvm.org/D70214	2019-11-19 16:05:22 -08:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Simon Pilgrim	bbf4af3109	[X86][SSE] Remove XFormVExtractWithShuffleIntoLoad to prevent legalization infinite loops (PR43971) As detailed in PR43971/D70267, the use of XFormVExtractWithShuffleIntoLoad causes issues where we end up in infinite loops of extract(targetshuffle(vecload)) -> extract(shuffle(vecload)) -> extract(vecload) -> extract(targetshuffle(vecload)), there are just too many legalization checks at every stage that we can't guarantee that extract(shuffle(vecload)) -> scalarload can occur. At the moment we see a number of minor regressions as we don't fold extract(shuffle(vecload)) -> scalarload before legal ops, these can be addressed in future patches and extension of X86ISelLowering's combineExtractWithShuffle.	2019-11-19 11:55:44 +00:00
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00

1 2 3 4 5 ...

6845 Commits