llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	cc9c545fb4	[InstCombine] generalize subtract with 'not' operands; 2nd try This is a re-try of `3aa009cc87` which was reverted at `9577fac0fd` because it caused an infinite loop. For the extra test case, either re-ordering the transforms or adding the extra clause to avoid sub-of-sub is enough to prevent the infinite compile, but I'm doing both to be safer. Original commit message: The motivation was to get min/max intrinsics to parity with cmp+select idioms, but this unlocks a few more folds because isFreeToInvert recognizes add/sub with constants too. In the min/max example, we have too many extra uses for smaller folds to improve things, but this fold is able to eliminate uses even though we can't reduce the number of instructions.	2021-08-23 17:06:51 -04:00
Florian Hahn	9577fac0fd	Revert "[InstCombine] generalize subtract with 'not' operands" This reverts commit `3aa009cc87`. The reverted commit causes an infinite loop in instcombine. See PR51584.	2021-08-23 15:47:21 +01:00
Sanjay Patel	3aa009cc87	[InstCombine] generalize subtract with 'not' operands The motivation was to get min/max intrinsics to parity with cmp+select idioms, but this unlocks a few more folds because isFreeToInvert recognizes add/sub with constants too. In the min/max example, we have too many extra uses for smaller folds to improve things, but this fold is able to eliminate uses even though we can't reduce the number of instructions.	2021-08-22 07:18:31 -04:00
Sanjay Patel	41af8f0ad5	[InstCombine] combine constants by reassociating add/sub/add This may overlap partially with the reassociate pass, but it seems simple enough that we should try it here in InstCombine to enable other folds. This shows up as an opportunity and potential regression if we improve a subtract fold with 'not' ops to be more general.	2021-08-21 11:45:43 -04:00
Sanjay Patel	0e15de2d0c	[InstCombine] fold reassociative FP add into start value of fadd reduction This pattern is visible in unrolled and vectorized loops. Although the backend seems to be able to reassociate to ideal form in the examples I looked at, we might as well do that in IR for efficiency.	2021-07-18 06:26:20 -04:00
Sanjay Patel	d2012d965d	[InstCombine] fix nsz (fast-math) propagation from fneg-of-select As discussed in the post-commit comments for: `3cdd05e519` It seems to be safe to propagate all flags from the final fneg except for 'nsz' to the new select: https://alive2.llvm.org/ce/z/J_APDc nsz has unique FMF semantics: it is not poison, it is only "insignificant" in the calculation according to the LangRef.	2021-06-08 17:04:30 -04:00
Sanjay Patel	4675beaa21	[InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold https://alive2.llvm.org/ce/z/3KPvih https://llvm.org/PR49654	2021-06-07 13:22:49 -04:00
Sanjay Patel	519e98cd9a	[InstCombine] refactor match clauses; NFC We need to adjust the FMF propagation on at least one of these transforms as discussed in: https://llvm.org/PR49654 ...so this should make it easier to intersect flags.	2021-06-07 13:22:49 -04:00
Sanjay Patel	3cdd05e519	[InstCombine] fold fnegs around select This is one of the folds requested in: https://llvm.org/PR39480 https://alive2.llvm.org/ce/z/NczU3V Note - this uses the normal FMF propagation logic (flags transfer from the final value to new/intermediate ops). It's not clear if this matches what Alive2 implements, so we may want to adjust one or the other.	2021-05-17 14:53:49 -04:00
Dávid Bolvanský	691badc3d6	[InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104) Proof: https://alive2.llvm.org/ce/z/mncA9K Solves https://bugs.llvm.org/show_bug.cgi?id=50104 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101257	2021-04-26 15:40:54 +02:00
Dávid Bolvanský	d4ec8ea19c	[InstCombine] ctpop(X) + ctpop(Y) => ctpop(X \| Y) if X and Y have no common bits (PR48999) For example: ``` int src(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16); } int tgt(unsigned int a, unsigned int b) { return __builtin_popcount((a << 16) \| (b >> 16)); } ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101210	2021-04-24 17:52:10 +02:00
Dávid Bolvanský	9aee07abd0	[InstCombine] X - usub.sat(X, Y) => umin(X, Y) Pattern regressed in LLVM 9 with the introduction of usub.sat. Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101184	2021-04-23 21:13:07 +02:00
Roman Lebedev	a36bb7fd76	[InstCombine] (X \| Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add https://alive2.llvm.org/ce/z/Coc5yf	2021-04-11 18:08:08 +03:00
Roman Lebedev	24f67473dd	[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants All of the code that handles general constant here (other than the more restrictive APInt-dealing code) expects that it is an immediate, because otherwise we won't actually fold the constants, and increase instruction count. And it isn't obvious why we'd be okay with increasing the number of constant expressions, those still will have to be run.. But after `2829094a8e` this could also cause endless combine loops. So actually properly restrict this code to immediates.	2021-04-07 19:50:19 +03:00
Roman Lebedev	2829094a8e	Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) This reverts commit `a547b4e26b`, relanding commit `31d219d299`, which was reverted because there was a conflicting inverse transform, which was causing an endless combine loop, which has now been adjusted. Original commit message: https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-07 12:06:25 +03:00
Roman Lebedev	93d1d94b74	[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants I.e., if any/all of the consants is an expression, don't do it. Since those constants won't reduce into an immediate, but would be left as an constant expression, they could cause endless combine loops after `31d219d299` added an inverse transformation.	2021-04-07 12:06:24 +03:00
Petr Hosek	a547b4e26b	Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)" This reverts commit `31d219d299` which causes an infinite loop when compiling the XRay runtime.	2021-04-06 22:30:28 -07:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Dávid Bolvanský	ae69fa9b9f	[InstCombine] Transform (A + B) - (A & B) to A \| B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = and i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = or i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/2fhW6r	2020-12-31 15:04:32 +01:00
Dávid Bolvanský	742ea77ca4	[InstCombine] Transform (A + B) - (A \| B) to A & B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = or i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = and i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/aQRh2j	2020-12-31 14:03:20 +01:00
Roman Lebedev	b3021a72a6	[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it A pattern to ignore ConstantExpr's is quite common, since they frequently lead into infinite combine loops, so let's make writing it easier.	2020-12-24 21:20:47 +03:00
Reid Kleckner	d2ed9d6b7e	Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC" We determined that the MSVC implementation of std::aligned* isn't suited to our needs. It doesn't support 16 byte alignment or higher, and it doesn't really guarantee 8 byte alignment. See https://github.com/microsoft/STL/issues/1533 Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC" Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back AlignedCharArrayUnion. This reverts commit `4d8bf870a8`. This reverts commit `d10f9863a5`. This reverts commit `4b5dc150b9`.	2020-12-14 17:04:06 -08:00
Duncan P. N. Exon Smith	d10f9863a5	ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC Prepare to delete `AlignedCharArrayUnion` by migrating its users over to `std::aligned_union_t`. I will delete `AlignedCharArrayUnion` and its tests in a follow-up commit so that it's easier to revert in isolation in case some downstream wants to keep using it. Differential Revision: https://reviews.llvm.org/D92516	2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith	5b267fb796	ADT: Stop peeking inside AlignedCharArrayUnion, NFC Update all the users of `AlignedCharArrayUnion` to stop peeking inside (to look at `buffer`) so that a follow-up patch can replace it with an alias to `std::aligned_union_t`. This was reviewed as part of https://reviews.llvm.org/D92512, but I'm splitting this bit out to commit first to reduce churn in case the change to `AlignedCharArrayUnion` needs to be reverted for some unexpected reason.	2020-12-04 11:07:42 -08:00
Sanjay Patel	678b9c5dde	[InstCombine] try difference-of-shifts factorization before negator We need to preserve wrapping flags to allow better folds. The cases with geps may be non-intuitive, but that appears to agree with Alive2: https://alive2.llvm.org/ce/z/JQcqw7 We create 'nsw' ops independent from the original wrapping on the sub.	2020-11-24 13:56:30 -05:00
Sanjay Patel	ab29f091eb	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps This is a retry of `324a53205`. I cautiously reverted that at `6aa3fc4` because the rules about gep math were not clear. Since then, we have added this line to LangRef for gep inbounds: "The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw)." See D90708 and post-commit comments on the revert patch for more details.	2020-11-23 16:50:09 -05:00
Nikita Popov	02dda1c659	[Local] Clean up EmitGEPOffset Handle the emission of the add in a single place, instead of three different ones. Don't emit an unnecessary add with zero to start with. It will get dropped by InstCombine, but we may as well not create it in the first place. This also means that InstCombine does not need to specially handle this extra add. This is conceptually NFC, but can affect worklist order etc.	2020-11-13 18:30:56 +01:00
Sanjay Patel	0abde4bc92	[InstCombine] fold sub of low-bit masked value from offset of same value There might be some demanded/known bits way to generalize this, but I'm not seeing it right now. This came up as a regression when I was looking at a different demanded bits improvement. https://rise4fun.com/Alive/5fl Name: general Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0 %a1 = add i8 %x, C1 %a2 = and i8 %x, C2 %r = sub i8 %a1, %a2 => %r = and i8 %a1, ~C2 Name: test 1 %a1 = add i8 %x, 192 %a2 = and i8 %x, 10 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -11 Name: test 2 %a1 = add i8 %x, -108 %a2 = and i8 %x, 3 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -4	2020-11-12 20:10:28 -05:00
Roman Lebedev	c009d11bda	[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator In particular, it makes it fire for C=0, because negator doesn't want to perform that fold since in general it's not beneficial.	2020-11-03 16:06:52 +03:00
Sanjay Patel	3f3356bdd9	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	f81200ae99	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Sanjay Patel	080e6bc205	[InstCombine] allow vector splats for add+and with high-mask There might be a better way to specify the pre-conditions, but this is hopefully clearer than the way it was written: https://rise4fun.com/Alive/Jhk3 Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2) %a = and %x, C2 %r = add %a, C1 => %a2 = add %x, C1 %r = and %a2, C2	2020-10-09 10:39:11 -04:00
Sanjay Patel	f688ae7a0e	[InstCombine] allow vector splats for add+xor with low-mask This can be allowed with undef elements too, but that can be another step: https://alive2.llvm.org/ce/z/hnC4Z-	2020-10-08 15:53:38 -04:00
Sanjay Patel	5ac89add1e	[InstCombine] remove unnecessary one-use check from add-xor transform Pre-conditions seem to be optimal, but we don't need a use check because we are only replacing an add with a sub. https://rise4fun.com/Alive/hzN Pre: (~C1 \| C2 == -1) && isPowerOf2(C2+1) %m = and i8 %x, C1 %f = xor i8 %m, C2 %r = add i8 %f, C3 => %r = sub i8 C2 + C3, %m	2020-10-08 15:08:51 -04:00
Sanjay Patel	b57451b011	[InstCombine] allow vector splats for add+xor with signmask	2020-10-08 10:46:34 -04:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Martin Storsjö	2c4c659666	[InstCombine] Add parentheses in assert to silence GCC warning. NFC.	2020-09-23 09:03:01 +03:00
Sanjay Patel	7903ae4720	[InstCombine] factorize left shifts of add/sub We do similar factorization folds in SimplifyUsingDistributiveLaws, but that drops no-wrap properties. Propagating those optimally may help solve: https://llvm.org/PR47430 The propagation is all-or-nothing for these patterns: when all 3 incoming ops have nsw or nuw, the 2 new ops should have the same no-wrap property: https://alive2.llvm.org/ce/z/Dv8wsU This also solves: https://llvm.org/PR47584	2020-09-20 12:55:24 -04:00
Sanjay Patel	6aa3fc4a5b	Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)" This reverts commit `324a53205a`. On closer examination of at least one of the test diffs, this does not appear to be correct in all cases. Even the existing 'nsw' creation may be wrong based on this example: https://alive2.llvm.org/ce/z/uL4Hw9 https://alive2.llvm.org/ce/z/fJMKQS	2020-09-11 10:54:48 -04:00
Sanjay Patel	324a53205a	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430) There's no signed wrap if both geps have 'inbounds': https://alive2.llvm.org/ce/z/nZkQTg https://alive2.llvm.org/ce/z/7qFauh	2020-09-11 10:39:09 -04:00
Sanjay Patel	8b30067919	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Sanjay Patel	3ca8b9a560	[InstCombine] give a name to an intermediate value for easier tracking; NFC As noted in PR47430, we probably want to conditionally include 'nsw' here anyway, so we are going to need to fill out the optional args.	2020-09-07 08:19:42 -04:00
Nikita Popov	57a26bb7b4	[InstCombine] Fix typo in comment (NFC) As pointed out in post-commit review of D63060.	2020-08-29 10:17:17 +02:00
Nikita Popov	ffe05dd125	[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding usub.sat(a, b) + b to umax(a, b). The backend will expand umax back to usubsat if that is profitable. We may also want to handle uadd.sat(a, b) - b in the future. Differential Revision: https://reviews.llvm.org/D63060	2020-08-28 21:52:29 +02:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sanjay Patel	8953ecf22b	[InstCombine] reassociate diff of sums into sum of diffs This is the integer sibling to D81491. (a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) --> (a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3]) Removing the "experimental" from these intrinsics is likely not too far away.	2020-06-22 20:47:09 -04:00
Sanjay Patel	b5fb26951a	[InstCombine] reassociate FP diff of sums into sum of diffs (a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) --> (a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3]) This should be the last step in solving PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953 We started emitting reduction intrinsics with: D80867/ rGe50059f6b6b3 So it's a relatively easy pattern match now to re-order those ops. Also, I have not seen any complaints for the switch to intrinsics yet, so I'll propose to remove the "experimental" tag from the intrinsics soon. Differential Revision: https://reviews.llvm.org/D81491	2020-06-14 09:09:03 -04:00
EgorBo	012909dcaf	[InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0" Summary: "X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two) However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression: "X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj This is my first contribution to LLVM so I hope I didn't mess things up Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79369	2020-06-12 10:20:06 +03:00
Sanjay Patel	1a2bffaf8b	[InstCombine] reassociate sub+add to increase adds and throughput The -reassociate pass tends to transform this kind of pattern into something that is worse for vectorization and codegen. See PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953 Follows-up the FP version of the same transform: rGa0ce2338a083	2020-05-26 14:49:17 -04:00
Sanjay Patel	a0ce2338a0	[InstCombine] reassociate fsub+fadd with FMF to increase adds and throughput The -reassociate pass tends to transform this kind of pattern into something that is worse for vectorization and codegen. See PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953	2020-05-26 13:17:15 -04:00

1 2 3 4 5 ...

325 Commits