llvm-project

Commit Graph

Author	SHA1	Message	Date
Kadir Cetinkaya	d2b6ac6ccd	Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This reverts commit `8413116bf1`. this seems to be causing crashes while compiling ncurses. ``` $ ./bin/llc bugpoint-reduced-simplified.ll LLVM ERROR: Cannot emit physreg copy instruction ``` Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e executing with an llc compiled at `904d54de9b` works fine.	2020-02-04 11:22:53 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Craig Topper	cd14b4a62b	[X86] Remove unneeded code that looks for (and (i8 (X86setcc_c)) I don't believe we use this construct anymore so I don't think we need to look for it.	2020-02-03 23:18:11 -08:00
Craig Topper	4581d97416	[X86] Remove some uncovered and possibly broken code from combineZext. This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1) but the code never checks what type we're truncating too. An and mask of 1 would only make sense if the trunc was to MVT::i1, but we didn't check for that. I believe this code is a leftover from when i1 was a legal type.	2020-02-03 22:59:39 -08:00
Craig Topper	8413116bf1	[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places. Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-03 21:01:11 -08:00
Craig Topper	c3a47221e0	[X86] Don't emit two X86ISD::COMI/UCOMI nodes when handling comi/ucomi intrinsics. We were creating two with different operand orders, and then only using one of them. Instead just swap the operands when needed and create a single node.	2020-02-03 20:08:01 -08:00
Simon Pilgrim	3ece5a23bd	[X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI.	2020-02-03 18:06:47 +00:00
Simon Pilgrim	8c0e715eb2	[X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0	2020-02-03 16:50:03 +00:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Simon Pilgrim	8ead5df0b1	[X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153) Add a KnownBits::extractBits helper	2020-02-03 15:43:59 +00:00
Simon Pilgrim	a9ee3ffbc0	[X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode Some prep work for PR39153.	2020-02-03 15:16:40 +00:00
Craig Topper	cf20fde1d1	[X86] Remove a couple unnecessary calls to ConvertCmpIfNecessary. We only need to call this on floating point comparisons. In this case these are known to be integer compares. One of them even has a SUB opcode instead of CMP.	2020-02-02 21:36:51 -08:00
Craig Topper	ee85415dbb	[X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper.	2020-02-02 13:24:37 -08:00
Simon Pilgrim	5d86ac82a6	Fix a few spelling mistakes in comments. NFCI.	2020-02-02 18:27:43 +00:00
Simon Pilgrim	17e91b7dd2	[X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling	2020-02-02 18:00:09 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Craig Topper	90c31b0f42	[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall. ISD::FROUND is defined to round to nearest with ties rounding away from 0. This mode isn't supported in hardware on X86. But as long as we aren't compiling with trapping math, we can emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)). We have to use nextafter to avoid some corner cases that adding 0.5 would have. For example, if X is nextafter(0.5, 0.0) it should round to 0.0, but adding 0.5 would need one extra bit of mantissa than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0) instead will just increase the exponent by 1 and leave the mantissa as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0. Techically this requires -fno-trapping-math which isn't our default. But if we care about exceptions we should be using constrained intrinsics. Constrained intrinsics would use STRICT_FROUND which won't go through this code. Fixes PR42195. Differential Revision: https://reviews.llvm.org/D73607	2020-01-29 09:10:02 -08:00
Craig Topper	e5edd641fd	[X86] Use a shorter sequence to implement FLT_ROUNDS This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping. This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code. Differential Revision: https://reviews.llvm.org/D73599	2020-01-29 08:56:33 -08:00
Craig Topper	ca2abea29a	[X86] Use SelectionDAG::getZExtOrTrunc to simplify some code. NFCI	2020-01-28 16:27:59 -08:00
Wang, Pengfei	3d1f0ce3b9	[X86] Add combination for fma and fneg on X86 under strict FP. Summary: X86 has instructions to calculate fma and fneg at the same time. But we combine the fneg and fma only when fneg is the source operand under strict FP. Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3 Subscribers: LuoYuanke, llvm-commits, cfe-commits, jdoerfert, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D72824	2020-01-28 20:09:56 +08:00
Simon Pilgrim	2d5e281b0f	[X86][AVX] Add a more aggressive SimplifyMultipleUseDemandedBits to simplify masked store masks. Fixes a poor codegen issue noticed in PR11210.	2020-01-27 16:44:25 +00:00
Simon Pilgrim	fa19d67a2a	[X86][AVX] Extend combineCommutableSHUFP to handle v8f32 and v16f32 commutable shufps patterns	2020-01-26 19:04:12 +00:00
Simon Pilgrim	1a81b296cd	[X86][SSE] combineCommutableSHUFP - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Pull out combineTargetShuffle code added in rG3fd5d1c6e7db into a helper function and extend it to handle shufps(shufps(load(),x),y) and shufps(y,shufps(load(),x)) cases as well.	2020-01-26 14:36:23 +00:00
Craig Topper	3fdd435a4b	[X86] Use a macro to convert X86ISD names to strings in getTargetNodeName. Every case in the switch had a string version of themselves. Two of them had a typo that used : instead of :: By using a macro we can automate the string creation and avoid the possibility of typos like this. This is similar to what is done on the AMDGPU target.	2020-01-25 18:27:29 -08:00
Craig Topper	2c1decc040	[X86] Break the loop in LowerReturn into 2 loops. NFCI I believe for STRICT_FP I need to use a STRICT_FP_EXTEND for the extending to f80 for returning f32/f64 in 32-bit mode when SSE is enabled. The STRICT_FP_EXTEND node requires a Chain. I need to get that node onto the chain before any CopyToRegs are emitted. This is because all the CopyToRegs are glued and chained together. So I can't put a STRICT_FP_EXTEND on the chain between the glued nodes without also glueing the STRICT_ FP_EXTEND. This patch moves all the extend creation to a first pass and then creates the copytoregs and fills out RetOps in a second pass. Differential Revision: https://reviews.llvm.org/D72665	2020-01-24 14:44:38 -08:00
Simon Pilgrim	3fd5d1c6e7	[X86][SSE] combineTargetShuffle - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Moves lowerShuffleWithSHUFPS commutation code from rG30fcd29fe479 to catch cases during combine	2020-01-24 15:23:20 +00:00
Simon Pilgrim	30fcd29fe4	[X86][SSE] lowerShuffleWithSHUFPS - commute '2V1+2V2 elements' mask if it allows a loaded fold As mentioned on D73023.	2020-01-24 12:04:10 +00:00
Guillaume Chatelet	805c157e8a	[Alignment][NFC] Deprecate Align::None() Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case. Reviewers: xbolva00, courbet, bollu Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73099	2020-01-24 12:53:58 +01:00
Simon Pilgrim	0ec25a0316	[X86] LowerRotate - early out for vector rotates by zero	2020-01-23 17:48:09 +00:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Sanjay Patel	363d27c871	[x86] fold vperm2x128 to concat of 128-bit high half vectors vperm (ins ?, X, C), (ins ?, Y, C), 0x31 --> concat X, Y This is another shuffle problem seen with PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 We have this small crack in legalization/lowering/combining/demanded that allows forming a vperm2f128 of high halves with AVX1 when we could do better by peeking through the insert_subvector nodes. AFAICT, it requires IR as shown in the diffs - much larger than legal vectors - to avoid all of the usual folds. Another option would prevent forming the 256-bit vperm in lowering. Differential Revision: https://reviews.llvm.org/D73197	2020-01-22 15:35:50 -05:00
Simon Pilgrim	5340434c94	[X86][SSE] combineExtractWithShuffle - extract(bitcast(broadcast(x))) --> x Removes some unnecessary gpr<-->fpu traffic	2020-01-22 18:02:58 +00:00
Simon Pilgrim	a14aa7dabd	[X86][SSE] combineExtractWithShuffle - extract(bictcast(scalar_to_vector(x))) --> x Removes some unnecessary gpr<-->fpu traffic	2020-01-22 16:11:08 +00:00
Simon Pilgrim	c784e5451b	Use SelectionDAG::getShiftAmountConstant(). NFCI.	2020-01-22 13:52:43 +00:00
Simon Pilgrim	963f268186	[X86][SSE] combineExtractWithShuffle - pull out repeated extract index code. NFCI.	2020-01-22 12:08:58 +00:00
Simon Pilgrim	b065902ed4	[X86] combineBT - use SimplifyDemandedBits instead of GetDemandedBits Another step towards removing SelectionDAG::GetDemandedBits entirely	2020-01-21 14:24:46 +00:00
Simon Pilgrim	eaa4548459	[X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling. Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.	2020-01-20 10:48:54 +00:00
Florian Hahn	0ee1db2d1d	[X86] Try to avoid casts around logical vector ops recursively. Currently PromoteMaskArithemtic only looks at a single operation to skip casts. This means we miss cases where we combine multiple masks. This patch updates PromoteMaskArithemtic to try to recursively promote AND/XOR/AND nodes that terminate in truncates of the right size or constant vectors. Reviewers: craig.topper, RKSimon, spatel Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D72524	2020-01-19 17:22:43 -08:00
Craig Topper	5fa2022ec0	[X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together. Summary: I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added. I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72805	2020-01-18 23:44:05 -06:00
Simon Pilgrim	69bc450882	[X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN Since it can only ever create VALIGN nodes.	2020-01-18 11:29:14 +00:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Sanjay Patel	43f60e614a	[x86] try harder to form 256-bit unpck* This is another part of a problem noted in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split into the expected 128-bit unpack instructions. We have to be selective in matching the types where we try to do this though. Otherwise, we can end up with more instructions (in the case of v8x32/v4x64). Differential Revision: https://reviews.llvm.org/D72575	2020-01-17 10:42:39 -05:00
Craig Topper	e445447921	[X86] When handling i64->f32 sint_to_fp on 32-bit targets only bitcast to f64 if sse2 is enabled. The code is trying to copy the i64 value to an xmm register to use a 64-bit store so that the 64-bit fild can benefit from store forwarding. But this trick only works if f64 is going to be stored in an XMM register. If we only have SSE1 then only float is in xmm register. So this trick just causes 2 stores i32 stores, an f64 load into the x87, an f64 from x87, and a 64-bit fild. So we end up with an extra stack temporary and still didn't get store forwarding. We might be able to use v2f32 here instead, but I didn't check. I just wanted the code to make sense. Found by inspection as I continue to stare too hard at our int_to_fp conversions.	2020-01-15 18:26:28 -08:00
Craig Topper	be8f217b18	[X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2. We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.	2020-01-15 00:43:07 -08:00
Reid Kleckner	40cd26c700	[Win64] Handle FP arguments more gracefully under -mno-sse Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465	2020-01-14 17:19:35 -08:00
Craig Topper	76291e1158	[X86] Drop an unneeded FIXME. NFC The extload on X87 is free.	2020-01-14 17:05:46 -08:00
Craig Topper	57eb56b839	[X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.	2020-01-14 17:05:23 -08:00
Craig Topper	98c54fb1fe	[X86] Directly emit a BROADCAST_LOAD from constant pool in lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971 By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before. Differential Revision: https://reviews.llvm.org/D72307	2020-01-14 10:50:39 -08:00
Simon Pilgrim	66e39067ed	[X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles. Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better). This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.	2020-01-12 12:29:41 +00:00
Simon Pilgrim	b375f28b0e	[X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded elements of the lane mask. Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.	2020-01-12 09:41:40 +00:00

1 2 3 4 5 ...

6845 Commits