llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	8c4abd20eb	[InstCombine] Make x86 PADDS/PSUBS constant folding tests generic As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder. PR40110 has been raised to fix the regression with constant folding vectors containing undef elements. llvm-svn: 349759	2018-12-20 13:50:12 +00:00
Simon Pilgrim	420d1e12b6	Regenerate test llvm-svn: 349646	2018-12-19 17:24:34 +00:00
Sanjay Patel	2aa2dc76c2	[InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927) call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: https://bugs.llvm.org/show_bug.cgi?id=39927 Differential Revision: https://reviews.llvm.org/D55529 llvm-svn: 348862	2018-12-11 16:38:03 +00:00
Sanjay Patel	5c00519c9a	[InstCombine] add tests for movmsk (PR39927) NFC llvm-svn: 348800	2018-12-10 21:44:20 +00:00
Sanjay Patel	3436dc2923	[InstCombine] drop poison flags in SimplifyVectorDemandedElts We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. llvm-svn: 343819	2018-10-04 21:36:50 +00:00
Sanjay Patel	cafdeb1aa6	[InstCombine] allow SimplifyDemandedVectorElts to work with FP binops We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727	2018-10-03 21:44:59 +00:00
Sanjay Patel	09e02fbf51	[InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814) Follow-up to rL342324 (D52059): Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 This is an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. llvm-svn: 342806	2018-09-22 14:43:55 +00:00
Sanjay Patel	296d35a5e9	[InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324	2018-09-15 14:25:44 +00:00
Sanjay Patel	b437238e95	[InstCombine] add more tests for x86 blendv (PR38814); NFC llvm-svn: 342237	2018-09-14 13:47:33 +00:00
Sanjay Patel	caa4de72a2	[InstCombine][x86] add tests for possible blendv transform (PR38814); NFC llvm-svn: 341715	2018-09-07 21:40:41 +00:00
Tomasz Krupa	e766e5f636	[X86] Constant folding of adds/subs intrinsics Summary: This adds constant folding of signed add/sub with saturation intrinsics. Reviewers: craig.topper, spatel, RKSimon, chandlerc, efriedma Reviewed By: craig.topper Subscribers: rnk, llvm-commits Differential Revision: https://reviews.llvm.org/D50499 llvm-svn: 339659	2018-08-14 09:04:01 +00:00
Craig Topper	034adf2683	[X86] Remove and autoupgrade the scalar fma intrinsics with masking. This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871	2018-07-12 00:29:56 +00:00
Craig Topper	350c5f1881	[X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement. llvm-svn: 336315	2018-07-05 06:52:55 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Mikhail Dvoretckii	8393f90717	[InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil This patch replaces calls to X86-specific intrinsics with floor-ceil semantics with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations. Differential Revision: https://reviews.llvm.org/D48067 llvm-svn: 335039	2018-06-19 10:49:12 +00:00
Tomasz Krupa	bcaab53d47	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849	2018-06-15 18:05:24 +00:00
Mikhail Dvoretckii	0531ec654a	NFC: Regenerating x86-sse41.ll test for InstCombine Test regenerated to reduce noise in further patches. llvm-svn: 334806	2018-06-15 07:59:29 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Craig Topper	e4c045b7df	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead. Someday maybe we'll use selects for all intrinsics. llvm-svn: 332824	2018-05-20 23:34:04 +00:00
Craig Topper	911025b1cd	[X86] Extend instcombine folds for pclmuldq intrinsics to the 256 and 512 bit version. llvm-svn: 332202	2018-05-13 21:56:32 +00:00
Craig Topper	f170b85d40	[X86] Add missing test for the InstCombines of pclmulqdq. Apparently this test was lost when r293151 was committed. It was present in the review, but not the commit. llvm-svn: 332199	2018-05-13 18:26:06 +00:00
Craig Topper	a17d627abb	[X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang. llvm-svn: 332146	2018-05-11 21:59:34 +00:00
Sanjay Patel	30be665e82	[PatternMatch] allow undef elements when matching a vector zero This is the last step in getting constant pattern matchers to allow undef elements in constant vectors. I'm adding a dedicated m_ZeroInt() function and building m_Zero() from that. In most cases, calling code can be updated to use m_ZeroInt() directly when there's no need to match pointers, but I'm leaving that efficiency optimization as a follow-up step because it's not always clear when that's ok. There are just enough icmp folds in InstSimplify that can be used for integer or pointer types, that we probably still want a generic m_Zero() for those cases. Otherwise, we could eliminate it (and possibly add a m_NullPtr() as an alias for isa<ConstantPointerNull>()). We're conservatively returning a full zero vector (zeroinitializer) in InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify), but I'm not sure if that's actually necessary in all cases. We may be able to propagate an undef lane instead. One test where this happens is marked with 'TODO'. llvm-svn: 330550	2018-04-22 17:07:44 +00:00
Craig Topper	254ed028a4	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990	2018-04-13 06:07:18 +00:00
Sanjay Patel	93e64dd9a1	[PatternMatch] allow undef elements when matching vector FP +0.0 This continues the FP constant pattern matching improvements from: https://reviews.llvm.org/rL327627 https://reviews.llvm.org/rL327339 https://reviews.llvm.org/rL327307 Several integer constant matchers also have this ability. I'm separating matching of integer/pointer null from FP positive zero and renaming/commenting to make the functionality clearer. llvm-svn: 328461	2018-03-25 21:16:33 +00:00
Sanjay Patel	c84b48ec29	[InstSimplify, InstCombine] add/update tests with FP +0.0 vector with undef; NFC llvm-svn: 328455	2018-03-25 17:48:20 +00:00
Sanjay Patel	2ee7b9349d	[ConstantFold] fp_binop undef, undef --> undef These are uncontroversial and independent of a proposed LangRef edits (D44216). I tried to fix tests that would fold away: rL327004 rL327028 rL327030 rL327034 I'm not sure if the Reassociate tests are meaningless yet, but they probably will be as we add more folds, so if anyone has suggestions or wants to fix those, please do. Differential Revision: https://reviews.llvm.org/D44258 llvm-svn: 327058	2018-03-08 20:42:49 +00:00
Craig Topper	4dccffc84a	[X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827	2018-02-10 23:33:55 +00:00
Simon Pilgrim	472689a159	[InstCombine] Check for isa<Instruction> before using cast<> Protects against casts from constexpr etc. Reduced from oss-fuzz #4788 test case llvm-svn: 321515	2017-12-28 09:35:35 +00:00
Craig Topper	f264fcc704	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450	2017-09-16 07:36:14 +00:00
Sanjay Patel	e6b48a1b02	[InstCombine] improve demanded vector elements analysis of insertelement Recurse instead of returning on the first found optimization. Also, return early in the caller instead of continuing because that allows another round of simplification before we might potentially lose undef information from a shuffle mask by eliminating the shuffle. As noted in the review, we could probably do better and be more efficient by moving all of demanded elements into a separate pass, but this is yet another quick fix to instcombine. Differential Revision: https://reviews.llvm.org/D37236 llvm-svn: 312248	2017-08-31 15:57:17 +00:00
Craig Topper	317a51e886	[X86][InstCombine] Add some simplifications for BZHI intrinsics This intrinsic clears the upper bits starting at a specified index. If the index is a constant we can do some simplifications. This could be in InstSimplify, but we don't handle any target specific intrinsics there today. Differential Revision: https://reviews.llvm.org/D36069 llvm-svn: 309604	2017-07-31 18:52:15 +00:00
Craig Topper	8324003818	[X86][InstCombine] Add basic simplification support for BEXTR/BEXTRI intrinsics. This patch adds simplification support for the BEXTR/BEXTRI intrinsics to match gcc. This only supports cases that fold to 0 or can be fully constant folded. Theoretically we could support converting to AND if the shift part is unused or to only a shift if the mask doesn't modify any bits after an equivalent shl. gcc doesn't do these transformations either. I put this in InstCombine, but it could be done in InstSimplify. It would be the first target specific intrinsic in InstSimplify. Differential Revision: https://reviews.llvm.org/D36063 llvm-svn: 309603	2017-07-31 18:52:13 +00:00
Justin Bogner	3c6fbad388	InstCombine: Move tests that use target intrinsics into subdirectories Tests with target intrinsics are inherently target specific, so it doesn't actually make sense to run them if we've excluded their target. llvm-svn: 302979	2017-05-13 05:39:46 +00:00

34 Commits