llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Simon Pilgrim	fcff2c32c0	X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Simon Pilgrim	0729ae367a	X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Craig Topper	da79b1eecc	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Craig Topper	01b3e16757	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Simon Pilgrim	9de0a3da6a	[X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448) We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.	2020-09-07 16:11:40 +01:00
Simon Pilgrim	5bb27e735d	X86AvoidStoreForwardingBlocks.cpp - use unsigned for Opcode values. NFCI. Fixes clang-tidy cppcoreguidelines-narrowing-conversions warnings.	2020-09-07 12:56:27 +01:00
Simon Pilgrim	9b645ebfff	[X86][AVX] Use lowerShuffleWithPERMV in shuffle combining to support non-VLX targets lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets. This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....	2020-09-07 12:50:50 +01:00
Benjamin Kramer	7ba0f81934	[X86] Unbreak the build after `22fa6b20d9`	2020-09-07 12:24:30 +02:00
Simon Pilgrim	71dfdbe2c7	[X86] getFauxShuffleMask - handle insert_subvector(zero, sub, C) Directly use SM_SentinelZero elements if we're (widening)inserting into a zero vector.	2020-09-07 11:10:40 +01:00
Simon Pilgrim	9ad261540d	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-07 10:49:29 +01:00
Simon Pilgrim	22fa6b20d9	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-07 10:38:09 +01:00
Simon Pilgrim	0dbe2504af	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warning.	2020-09-07 10:38:08 +01:00
Simon Pilgrim	ecac5c2808	[X86][AVX] lowerShuffleWithPERMV - adjust binary shuffle masks to account for widening on non-VLX targets rGabd33bf5eff2 enabled us to pad 128/256-bit shuffles to 512-bit on non-VLX targets, but wasn't updating binary shuffles to account for the new vector width.	2020-09-06 14:52:25 +01:00
Craig Topper	35b35a373d	[X86] Prevent shuffle combining from creating an identical X86ISD::SHUF128. This can cause an infinite loop if SimplifiedDemandedElts asks for the node to replace itself. A similar protection exists in other places in shuffle combining. Fixes ISPC https://github.com/ispc/ispc/issues/1864	2020-09-04 14:12:49 -07:00
Simon Pilgrim	740625fecd	[X86] Make lowerShuffleAsLanePermuteAndPermute use sublanes on AVX2 Extends lowerShuffleAsLanePermuteAndPermute to search for opportunities to use vpermq (64-bit cross-lane shuffle) and vpermd (32-bit cross-lane shuffle) to get elements into the correct lane, in addition to the 128-bit full-lane permutes it previously searched for. This is especially helpful in cross-lane byte shuffles, where the alternative tends to be "vpshufb both lanes separately and blend them with a vpblendvb", which is very expensive, especially on Haswell where vpblendvb uses the same execution port as all the shuffles. Addresses PR47262 Patch By: @TellowKrinkle (TellowKrinkle) Differential Revision: https://reviews.llvm.org/D86429	2020-09-04 11:41:26 +01:00
Craig Topper	0851350557	[X86] Update stale comment. NFC The optimization in ExpandIntOp_UINT_TO_FP was removed in D72728 in January 2020.	2020-09-03 16:19:10 -07:00
Simon Pilgrim	58afaecdc2	X86/X86TargetObjectFile.cpp - remove unused headers. NFCI.	2020-09-03 15:17:44 +01:00
Simon Pilgrim	0563cd6739	Fix spelling mistake. NFC.	2020-09-03 15:17:44 +01:00
Simon Pilgrim	890707aa01	[X86] Avoid llvm-qualified-auto warning by not using auto. NFC. Try to consistently use the actual type name in the file.	2020-09-03 14:21:17 +01:00
Simon Pilgrim	23d9f4b958	[X86] Fix llvm-qualified-auto warning by using auto*. NFC.	2020-09-03 14:21:17 +01:00
Simon Pilgrim	5b29269744	[X86] Fix llvm-qualified-auto warning by using const auto*. NFC.	2020-09-03 14:21:17 +01:00
Simon Pilgrim	e56edb801b	[X86][SSE] Fold select(X > -1, A, B) -> select(0 > X, B, A) (PR47404) Help PBLENDVB peek through to the sign bit source of the selection mask by swapping the select condition and inputs.	2020-09-03 13:02:08 +01:00
Simon Pilgrim	888049b97a	[X86][SSE] Fold vselect(pshufb,pshufb) -> or(pshufb,pshufb) If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend. Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.	2020-09-02 16:55:00 +01:00
Martin Storsjö	4820af2bfc	[X86] Remove superfluous trailing semicolons, fixing warnings. NFC.	2020-09-02 11:43:27 +03:00
Simon Pilgrim	21d02dc595	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts. We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly. Differential Revision: https://reviews.llvm.org/D66004	2020-09-02 09:24:46 +01:00
Eric Astor	a57fdcdd40	x87 FPU state instructions do not use an f32 memory location These instructions actually use a 512-byte location, where bytes 464-511 are ignored. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86942	2020-09-01 13:50:07 -04:00
Sanjay Patel	2d3e12818e	[FastISel] update to use intrinsic's isCommutative(); NFC This requires adding a missing 'const' to the definition because the callers are using const args, but there should be no change in behavior. The intrinsic method was added with D86798 / rG096527214033	2020-08-30 11:36:41 -04:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Craig Topper	ba852e1e19	[X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl We only need to call getFnAttribute and then check if the Attribute is None or not.	2020-08-27 10:40:20 -07:00
Matt Arsenault	0b7f6cc71a	GlobalISel: Add generic instructions for memory intrinsics AArch64, X86 and Mips currently directly consumes these and custom lowering to produce a libcall, but really these should follow the normal legalization process through the libcall/lower action.	2020-08-26 20:08:45 -04:00
Craig Topper	92d3e70df3	[X86] Change pentium4 tuning settings and scheduler model back to their values before D83913. Clang now defaults to -march=pentium4 -mtune=generic so we don't need modern tune settings on pentium4.	2020-08-26 15:38:12 -07:00
Craig Topper	09288bcbf5	[X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size. This is an older syntax than the {disp32} and {disp8} pseudo prefixes that were added a few weeks ago. We can reuse most of the support for that to support .d32 and .d8 as well.	2020-08-26 10:45:50 -07:00
Pierre Gousseau	cda6b09242	[X86] Make sure we do not clobber RBX with mwaitx when used as a base pointer. mwaitx uses EBX as one of its argument. Using this instruction clobbers RBX as it is defined to hold one of the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. This patch is adapted from @qcolombet patch for cmpxchg at r263325. This fixes PR43528. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D73475	2020-08-26 11:20:31 +01:00
Craig Topper	1d1515a9e2	[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG Since we can only copy to GR32 we had to EXTRACT from GR32, but we would first go to GR16 and then the truncate would extra again to GR8. This adds a special case to go directly from GR32 to GR8. This would eventually get cleaned up, but though maybe we should avoid doing it in the first place. Our k-register handling is weird and we could probably stand to have some more special ISD nodes for the conversions so the i32 type would be explicit.	2020-08-25 18:20:43 -07:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Freddy Ye	e02d081f2b	[X86] Support -march=sapphirerapids Support -march=sapphirerapids for x86. Compare with Icelake Server, it includes 14 more new features. They are amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote, enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86503	2020-08-25 14:21:21 +08:00
Craig Topper	f7c87b7e37	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Fangrui Song	bef684154d	[X86][FastISel] Support materializing floating-point constants for large code model & PIC The following program miscompiles because rL216012 added static relocation model support but not for PIC. ``` // clang -fpic -mcmodel=large -O0 a.cc double foo() { return 42.0; } ``` This patch adds PIC support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86024	2020-08-23 08:36:18 -07:00
Hiroshi Yamauchi	28ccc52c40	[X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer. Differential Revision: https://reviews.llvm.org/D85989	2020-08-19 13:39:42 -07:00
Simon Pilgrim	057bdd63a4	[X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.	2020-08-19 14:34:32 +01:00
Simon Pilgrim	9fee2bad6d	[X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI. canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.	2020-08-19 13:28:59 +01:00
Simon Pilgrim	b61cef3a92	[X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths. Thanks to @fhahn for the test case.	2020-08-19 13:00:26 +01:00
Simon Pilgrim	80a0dc59b7	[X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling. Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.	2020-08-19 11:39:27 +01:00
Simon Pilgrim	46fc9a0dfc	[X86][AVX] Fold store(extract_element(vtrunc)) to truncated store Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly. Differential Revision: https://reviews.llvm.org/D86158	2020-08-19 11:10:20 +01:00
Craig Topper	9028c03ce6	[X86] Fix the Predicates on MMX_PSHUFWri/PSHUFWmi to include SSE1 in addition to MMX. These instructions weren't in the initial version of MMX, but were added when SSE1 was introduced. We already have the intrinsic named correctly to include sse and the frontened header enforces sse. We have one place in the backend where we DAG combine to this intrinsic, but that's also qualified. So don't know of anything currently broken unless someone writes their own IR and doesn't set the sse feature.	2020-08-18 14:28:26 -07:00
Simon Pilgrim	11ff5176c4	[X86][AVX] lowerShuffleWithVPMOV - add non-VLX support. We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.	2020-08-18 17:51:14 +01:00
Simon Pilgrim	abd33bf5ef	[X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles. TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.	2020-08-18 15:46:02 +01:00
Simon Pilgrim	011bf4fd96	[X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles. This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.	2020-08-18 15:30:02 +01:00

1 2 3 4 5 ...

20905 Commits