llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
chenglin.bi	30e49a3794	[InstCombine] Optimise shift+and+boolean conversion pattern to simple comparison if (`C1` is pow2) & (`(C2 & ~(C1-1)) + C1)` is pow2): ((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/EJAl1R ((C1 << X) & C2) != 0 -> X < (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/3bVRVz And remove dead code. Fix: https://github.com/llvm/llvm-project/issues/56124 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126591	2022-06-23 21:53:07 +08:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Chenbing Zheng	0eff6c6ba8	[InstCombine] add vector support for (A >> C) == (B >> C) --> (A^B) u< (1 << C) Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D127398	2022-06-20 10:55:47 +08:00
Eric Gullufsen	73202130e5	[InstCombine] Optimize test for same-sign of values (icmp slt (X & Y), 0) \| (icmp sgt (X \| Y), -1) -> (icmp sgt (X ^ Y), -1) (icmp slt (X \| Y), 0) & (icmp sgt (X & Y), -1) -> (icmp slt (X ^ Y), 0) [[ https://alive2.llvm.org/ce/z/qXxEFP \| alive2 example ]] [[ https://godbolt.org/z/aWf9c6j74 \| godbolt ]] [[ https://godbolt.org/z/5Ydn5TehY \| godbolt for inverted form ]] [[ https://alive2.llvm.org/ce/z/93AODr \| alive2 for inverted form ]] [[ https://github.com/llvm/llvm-project/issues/55988 \| issue #55988 ]] Differential Revision: https://reviews.llvm.org/D127903	2022-06-19 16:18:19 -04:00
Sanjay Patel	0399473de8	[InstCombine] add fold for (ShiftC >> X) <u C https://alive2.llvm.org/ce/z/RcdzM- This fixes a regression noted in issue #56046.	2022-06-19 11:03:28 -04:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Sanjay Patel	bfde861935	[InstCombine] convert mask and shift of power-of-2 to cmp+select When the mask is a power-of-2 constant and op0 is a shifted-power-of-2 constant, test if the shift amount equals the offset bit index: (ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0 (ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0 This is an alternate to D127610 with a more general pattern. We match only shift+and instead of the trailing xor, so we see a few more tests diffs. I think we discussed this initially in D126617. Here are proofs for shifts in both directions: https://alive2.llvm.org/ce/z/CFrLs4 The test diffs look equal or better for IR, and this makes the patterns more uniform in IR. The backend can partially invert this in both cases if that is profitable. It is not trivially reversible, however, so if we find perf regressions that are not easy to undo, then we may want to revert this. Differential Revision: https://reviews.llvm.org/D127801	2022-06-17 10:51:57 -04:00
Nikita Popov	c6b88cb918	[InstCombine] Push freeze through recurrence phi We really want to push freezes through recurrence phis, so that we freeze only the start value, rather than the IV value on every iteration. foldOpIntoPhi() already handles this for the case where the transfer function doesn't produce poison, e.g. %iv.next = add %iv, 1. However, this does not work if nowrap flags are present, e.g. the very common %iv.next = add nuw %iv, 1 case. This patch adds a fold that pushes freeze instructions to the start value by checking whether all backedge values will be non-poison after poison generating flags have been dropped. This allows pushing freezes out of loops in most cases. I suspect that this also obsoletes the CanonicalizeFreezeInLoops pass, and we can probably drop it. Fixes https://github.com/llvm/llvm-project/issues/56048. Differential Revision: https://reviews.llvm.org/D127960	2022-06-17 15:01:41 +02:00
Heejin Ahn	b2f4112f25	[InstCombine] Improve check for catchswitch BBs (NFC) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D127810	2022-06-15 01:06:13 -07:00
chenglin.bi	286198ff04	[InstCombine] Optimize lshr+shl+and conversion pattern if `C1` and `C3` are pow2 and `Log2(C3) >= C2`: ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0 https://alive2.llvm.org/ce/z/zvrkKF Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127469	2022-06-14 11:06:10 +08:00
Heejin Ahn	ac4006b0d6	[InstCombine] Don't slice up PHIs when pred BB has catchswitch If an integer PHI has an illegal type (according to the data layout) and it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI into various instructions in its predecessors: `6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)` So this can produce code like the following: Before: ``` pred: ... bb: %p = phi i8 [ %somevalue, %pred ], ... ... %tobool = trunc i8 %p to i1 use %tobool ... ``` In this code, `%p` has an illegal integer type, `i8`, and its only used in a `trunc` instruction later. In this case this pass puts extraction code in its predecessors: After: ``` pred: ... %t = and i8 %somevalue, 1 %extract = icmp ne i8 %t, 0 bb: %p.new = phi i1 [ %extract, %pred ], ... use %p.new instead of %tobool ``` But this doesn't work if `pred` is a `catchswitch` BB because it cannot have any non-PHI instructions. This CL ensures we bail out in that case. Fixes https://github.com/llvm/llvm-project/issues/55803. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D127699	2022-06-13 18:32:09 -07:00
Sanjay Patel	310adb658c	[InstCombine] reorder mask folds for efficiency This shows narrowing improvements on the logic tests (transforms recently added with `e247b0e5c9`). This is not a complete fix. That would require adding folds to visitOr/visitXor. But it enables the expected transforms for the basic patterns in the affected tests.	2022-06-13 09:49:57 -04:00
David Sherwood	83251896d7	[NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp Introduce a lambda function so that we remove a lot of code duplication. Differential Revision: https://reviews.llvm.org/D127493	2022-06-13 10:37:07 +01:00
Nikita Popov	92a9b1c918	[InstCombine] Don't push operation across loop phi When pushing an operation across a phi node, we should avoid doing so across a loop backedge. This is generally non-profitable, because it does not reduce the number of times the operation is executed, and could lead to an infinite combine loop. The code was already guarding against this, but using an insufficiently strong condition, which did not cover the case where the operation was originally outside the loop (in which case the transform moves the operation from outside the loop into the loop, which is particularly undesirable). Differential Revision: https://reviews.llvm.org/D127499	2022-06-13 10:48:09 +02:00
Nuno Lopes	e5c5f92e12	[InstCombine] switch synthetic unreachable to use undef instead of poison (NFC)	2022-06-10 21:54:09 +01:00
Sanjay Patel	e247b0e5c9	[InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try) The 1st try ( `afa192cfb6` ) was reverted because it could cause an infinite loop with constant expressions. A test for that and an extra condition to enable the transform are added now. I also added code comments to better describe the transform and the existing, related transform. Original commit message: https://alive2.llvm.org/ce/z/hRy3rE As shown in D123408, we can produce this pattern when moving casts around, and we already have a related fold for a binop with a constant operand.	2022-06-10 12:42:27 -04:00
Guillaume Chatelet	dc9c2eac98	[NFC][Alignment] Simplify code	2022-06-10 15:25:28 +00:00
Sanjay Patel	6fedc6a2b4	Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand" This reverts commit `afa192cfb6`. This can cause an infinite loop as shown with an example in the post-commit thread.	2022-06-10 08:25:10 -04:00
David Sherwood	8daaea206b	[InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds In foldSelectIntoOp we sometimes transform a select of a fadd into a fadd of a select, where we select between data and an identity value. For both fadd and fsub the identity is always -0.0, but if the nsz flag is set on the select instruction we can use +0.0 instead. Doing so then triggers other optimisations, such as when folding the select of masked load into a new masked load. Differential Revision: https://reviews.llvm.org/D126774	2022-06-10 12:42:34 +01:00
chenglin.bi	de7a6ae1ff	[InstCombine] Optimize shl+lshr+and conversion pattern if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`: ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0; https://alive2.llvm.org/ce/z/Pus5bd Fix issue https://github.com/llvm/llvm-project/issues/55739 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126617	2022-06-10 09:36:58 +08:00
Sanjay Patel	afa192cfb6	[InstCombine] add narrowing transform for low-masked binop with zext operand https://alive2.llvm.org/ce/z/hRy3rE As shown in D123408, we can produce this pattern when moving cast around, and we already have a related fold for a binop with a constant operand.	2022-06-09 16:59:26 -04:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Biplob Mishra	d87bfa9ad0	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Additionally this commit fixes the issue reported with the test case. int f(int a, int b) { int c = ((unsigned char)(a >> 23) & 925); if (a) c = (a >> 23 & b) \| ((unsigned char)(a >> 23) & 925) \| (b >> 23 & 157); return c; } The previous revision/commit did not check one-use of an intermediate value that this transform re-uses. When that value has another use, an existing transform will try to invert the transform here. By adding one-use checks, we avoid the infinite loops seen with the earlier commit. Differential Revision: https://reviews.llvm.org/D124119	2022-06-09 10:58:30 +01:00
Chenbing Zheng	38992d2c5e	[InstCombine] improve fold for icmp-ugt-ashr Existing condition for fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1 missed some boundary. It cause this fold don't work for some cases, and the reason is due to signed number overflow. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127188	2022-06-09 16:22:12 +08:00
Wael Yehia	0952cf5bbb	[InstCombine] decomposeSimpleLinearExpr should bail out on negative operands. InstCombine tries to rewrite %prod = mul nsw i64 %X, Scale %acc = add nsw i64 %prod, Offset %0 = alloca i8, i64 %acc, align 4 %1 = bitcast i8* %0 to i32* Use ( %1 ) into %prod = mul nsw i64 %X, Scale/4 %acc = add nsw i64 %prod, Offset/4 %0 = alloca i32, i64 %acc, align 4 Use (%0) But it assumes Scale is unsigned, and performs an unsigned division. So we should bail out if Scale cannot be interpreted as an unsigned safely. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126546	2022-06-08 00:57:25 +00:00
Sanjay Patel	cae993d4c8	[InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits If we don't demand low bits and it is valid to pre-shift a constant: (C2 >> X) << C1 --> (C2 << C1) >> X https://alive2.llvm.org/ce/z/_UzTMP This is the reverse-order shift sibling to `82040d414b` ( D127122 ). It seems likely that we would want to add this to the SDAG version of the code too to keep it on par with IR.	2022-06-07 18:43:27 -04:00
Sanjay Patel	a4d2c5ecaa	[InstCombine] reduce code duplication for accessing type; NFC	2022-06-07 18:43:27 -04:00
Sanjay Patel	82040d414b	[InstCombine] reduce right-shift-of-left-shifted constant via demanded bits If we don't demand high bits (zeros) and it is valid to pre-shift a constant: (C2 << X) >> C1 --> (C2 >> C1) << X https://alive2.llvm.org/ce/z/P3dWDW There are a variety of related patterns, but I haven't found a single solution that gets all of the motivating examples - so pulling this piece out of D126617 along with more tests. We should also handle the case where we shift-right followed by shift-left, but I'll make that a follow-on patch assuming this one is ok. It seems likely that we would want to add this to the SDAG version of the code too to keep it on par with IR. Differential Revision: https://reviews.llvm.org/D127122	2022-06-07 13:28:18 -04:00
Sanjay Patel	3f33d67d8a	[InstCombine] fold mul with masked low bit operand to trunc+select https://alive2.llvm.org/ce/z/o7rQ5q This shows an extra instruction in some cases, but that is caused by an existing canonicalization of trunc -> and+icmp. Codegen should be better for any target where a multiply is more costly than the most simple ALU op. This ends up producing the requested x86 asm from issue #55618, but it's not the same IR. We are missing a canonicalization from the negate+mask pattern to the trunc+select created here.	2022-06-05 20:07:18 -04:00
Sanjay Patel	8689463bfb	[InstCombine] make pattern matching more consistent; NFC We could go either way on this and several similar matches. Just matching as a binop is possibly slightly more efficient; we don't need to re-confirm the opcode of the instruction.	2022-06-02 16:01:23 -04:00
Alexander Kornienko	aa98e7e1eb	Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined." This reverts commit `ec4adf1f6c`. The commit causes clang to hang on a certain input: ``` $ cat q.cc int f(int a, int b) { int c = ((unsigned char)(a >> 23) & 925); if (a) c = (a >> 23 & b) \| ((unsigned char)(a >> 23) & 925) \| (b >> 23 & 157); return c; } $ time ./clang-15-10515 --target=x86_64--linux-gnu -O1 -c q.cc ^C real 0m45.072s user 0m0.025s sys 0m0.099s ```	2022-06-01 14:20:00 +02:00
Sanjay Patel	2bf6123f22	[InstCombine] fold icmp of sext bool based on limited range X <=u (sext i1 Y) --> (X == 0) \| Y https://alive2.llvm.org/ce/z/W_tZzo This is the conjugate/sibling pattern suggested with D126171 for a sign-extended bool value.	2022-05-31 12:37:56 -04:00
Nikita Popov	36cbdaa163	[InstCombine] Fix inbounds preservation when swapping GEPs (PR44206) When reassociating GEPs, we can only keep inbounds if both original GEPs were inbounds, and their offsets have the same sign. For the sake of simplicity, I only handle the case where both offsets are non-negative here. It would probably be fine to just not preserve inbounds at all here, but as I don't see a compile-time impact for adding the isKnownNonNegative() calls I went with this more conservative approach. Fixes https://github.com/llvm/llvm-project/issues/44206. Differential Revision: https://reviews.llvm.org/D126687	2022-05-31 15:45:02 +02:00
Danila Malyutin	4fb3fd7d82	[InstCombine] Fix const folding of switches with default case In case phi was in the default block it could lead to multi-edge. Fixes #55721. Differential Revision: https://reviews.llvm.org/D126650	2022-05-31 15:13:58 +03:00
Nikita Popov	872d69e5d4	[InstCombine] Fix inbounds preservation when merging GEPs (PR55722) Even if the total offset is inbounds, we might represent it by first performing a large negative offset and then a small positive one. With inbounds semantics as currently specified, each offset must be inbounds individually, not just the overall offset of the GEP. Fix this by checking that the sign of all offsets is the same. Fixes https://github.com/llvm/llvm-project/issues/55722.	2022-05-31 11:54:01 +02:00
Sanjay Patel	a0c3c60728	[InstCombine] fold shift-right-by-constant with shift-right-of-constant operand (C2 >> X) >> C1 --> (C2 >> C1) >> X The shift-left form of this transform has existed since: `16f18ed7b5` ...but it applies to matching shift right opcodes too: https://alive2.llvm.org/ce/z/c5eQms	2022-05-30 15:30:01 -04:00
Sanjay Patel	c5d942a4fb	[InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold The restriction goes back to: `16f18ed7b5` ...but the fold only replaces a shift with a shift, so that's not necessary. Generalizing to other opcodes is planned as a follow-up.	2022-05-30 15:17:54 -04:00
Nikita Popov	a770f534e6	[InstCombine] When swapping GEPs, only keep inbounds if both are If only one of the GEPs is inbounds, then after swapping, there is no guarantee that one of them will be inbounds as well (see e.g. https://alive2.llvm.org/ce/z/agaCnp). This is only a partial fix, because even if both are inbounds, the result is not necessarily inbounds (if the offsets have different signs).	2022-05-30 17:04:42 +02:00
Nikita Popov	2d7bab666f	[InstCombine] Always create new GEPs when swapping GEPs As the long explanatory comment attests, performing the modification in place is pretty tricky. Drop this unnecessary complexity and always create new instructions. This should be NFC-ish, but can probably cause difference due to worklist order.	2022-05-30 16:48:52 +02:00
zhongyunde	3e6ba89055	[InstCombine] Fold a mul with bool value into and Fixes https://github.com/llvm/llvm-project/issues/55599 X * Y --> X & Y, iff X, Y can be only {0, 1}. https://alive2.llvm.org/ce/z/_RsTKF Reviewed By: spatel, nikic Differential Revision: https://reviews.llvm.org/D126040	2022-05-30 21:05:00 +08:00
Chenbing Zheng	ef256ed58e	[InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest) Only solve dest type is vector to avoid inverse transform in visitBitCast. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125951	2022-05-30 10:16:32 +08:00
Sanjay Patel	b5b6aa4d53	[InstCombine] fold multiply by signbit-splat to cmp+select (ashr i32 X, 31) * C --> (X < 0) ? -C : 0 https://alive2.llvm.org/ce/z/G8u9SS With a constant operand, this is an improvement in IR and codegen (where it can be converted to a mask op). Without a constant operand, we would have to negate the operand, so that is probably better left to the backend. This is similar but not the same optimization that is requested in #55618.	2022-05-27 11:54:19 -04:00
Sanjay Patel	5a6e085757	[InstCombine] reduce code duplication; NFC	2022-05-27 11:54:19 -04:00
Sanjay Patel	c4c750058f	[InstCombine] fold mul of signbit directly to X < 0 ? Y : 0 This is effectively NFC (intentionally no test diffs) because we already have the related fold that converts the 'and' pattern to select. So this is just an efficiency improvement.	2022-05-26 16:19:15 -04:00
Sanjay Patel	49f8b05137	[InstCombine] fold icmp equality with sdiv and SMIN This extends the fold from D126410 / `3952c905ef` to allow for the only case where it works with signed division: https://alive2.llvm.org/ce/z/k7_ypu (X s/ Y) == SMIN --> (X == SMIN) && (Y == 1) (X s/ Y) != SMIN --> (X != SMIN) \|\| (Y != 1) This is another improvement based on #55695.	2022-05-26 16:19:15 -04:00
Sanjay Patel	ed5be1523f	[InstCombine] reduce code duplication in icmp+div folds; NFC	2022-05-26 16:19:15 -04:00
Sanjay Patel	3952c905ef	[InstCombine] fold icmp equality with udiv and large constant With large compare constant: (X u/ Y) == C --> (X == C) && (Y == 1) (X u/ Y) != C --> (X != C) \|\| (Y != 1) https://alive2.llvm.org/ce/z/EhKwh6 There are various potential missing icmp (div) transforms shown here: https://github.com/llvm/llvm-project/issues/55695 This is a generalization for part of the udiv + equality. I didn't check in detail, but some of those may only make sense as codegen transforms. This results in one extra instruction in IR, but it is better for analysis, and looks much better in codegen on all targets that I tried. Differential Revision: https://reviews.llvm.org/D126410	2022-05-26 09:08:47 -04:00
Chenbing Zheng	1486a9c9fe	[InstCombine] [NFC] refector foldXorOfICmps Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126268	2022-05-26 11:07:18 +08:00
Chenbing Zheng	41aab93afc	[InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y)) This patch break foldBitCastBitwiseLogic limite the destination must have an integer element type, and eliminate one bitcast by doing the logic op in the type of the input that has an integer element type. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126184	2022-05-26 10:23:44 +08:00
Chenbing Zheng	269e3f7369	[InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126056	2022-05-25 10:21:39 +08:00
Sanjay Patel	05527b68a0	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the shuffle reduces vector length, the transform reduces the width of the cast, so that should be a win for most codegen (if not, it can be inverted).	2022-05-24 15:11:38 -04:00
Nikita Popov	e6e0eb3bc8	[InstCombine] Strip bitcasts in GEP diff fold Bitcasts were stripped in one case, but not the other. Of course, this no longer really matters with opaque pointers, but as I went through the trouble of tracking this down, we may as well remove one typed vs opaque pointer optimization discrepancy.	2022-05-24 16:12:01 +02:00
Nikita Popov	b2a13d3e2d	[InstCombine] Use IRBuilder in freeze pushing transform (PR55619) Use IRBuilder so that the newly created freeze instructions automatically gets inserted back into the IC worklist. The changed worklist processing order leads to some cosmetic differences in tests. Fixes https://github.com/llvm/llvm-project/issues/55619.	2022-05-24 15:48:28 +02:00
Nikita Popov	a7c079aaa2	[InstCombine] Support logical and in masked icmp fold Most of the folds implemented in this function work fine with logical operations. We only need to be careful for the cases that work on non-constant masks, where the RHS operand shouldn't be poison. This is a conservative implementation that bails out of illegal transforms, but we could also change these to insert freeze instead.	2022-05-24 11:16:33 +02:00
Nikita Popov	5abaabed22	[InstCombine] Use m_APInt() in asymmetric masked icmp fold This is mostly intended as code cleanup, but it does also add support for splat vectors to this fold.	2022-05-24 10:57:28 +02:00
Nikita Popov	c0e06c7448	[InstCombine] Handle logical and/or in recursive and/or of icmps fold The and/or of icmps fold is also applied in reassociated form. However, this currently only happens for bitwise and of bitwise and, but not for bitwise and of logical and (or other combinations, but this is the one being addressed here). We can do this for bitwise+logical combinations as well, but need to be a bit careful about which of the resulting ands are logical: https://alive2.llvm.org/ce/z/WYSjGh https://alive2.llvm.org/ce/z/guxYnz https://alive2.llvm.org/ce/z/S5SYxY https://alive2.llvm.org/ce/z/2rAWeW	2022-05-24 10:13:10 +02:00
Sanjay Patel	e8c20d995b	[IR] add and use pattern match specialization for sqrt intrinsic; NFC This was included in D126190 originally, but it's independent and a useful change for readability.	2022-05-23 14:16:30 -04:00
Nikita Popov	f45c1e436e	[InstCombine] Change operand order in recursive and/or of icmps fold The order obviously doesn't matter for bitwise and/or, but would matter for logical and/or, so change it to preserve the original order.	2022-05-23 17:29:33 +02:00
Sanjay Patel	1ebad988b1	[InstCombine] fold icmp of zext bool based on limited range X <u (zext i1 Y) --> (X == 0) && Y https://alive2.llvm.org/ce/z/avQDRY This is a generalization of `4069cccf3b` based on the post-commit suggestion. This also adds the i1 type check and tests that were missing from the earlier attempt; that commit caused several bot fails and was reverted. Differential Revision: https://reviews.llvm.org/D126171	2022-05-23 09:59:21 -04:00
Nikita Popov	45226d04f0	[InstCombine] Reuse icmp of and/or folds for logical and/or Similarly to a change recently done for fcmps, add a flag that indicates whether the and/or is logical to foldAndOrOfICmps, and reuse the function when folding logical and/or. We were already calling some parts of it, but this gives us a clearer indication of which parts may need poison-safe variants, and would also allow to fold combinations of bitwise and logical and/or. This change should be close to NFC, because all folds this enables were either already called previously, or can make use of implied poison reasoning.	2022-05-23 15:37:07 +02:00
Sanjay Patel	cba0ebd576	Revert "[InstCombine] fold icmp with sub and bool" This reverts commit `4069cccf3b`. This causes bot failures, and there's a possibly a better way to get this and other patterns.	2022-05-22 12:13:20 -04:00
Sanjay Patel	4069cccf3b	[InstCombine] fold icmp with sub and bool This is the specific pattern seen in #53432, but it can be extended in multiple ways: 1. The 'zext' could be an 'and' 2. The 'sub' could be some other binop with a similar ==0 property (udiv). There might be some way to generalize using knownbits, but that would require checking that the 'bool' value is created with some instruction that can be replaced with new icmp+logic. https://alive2.llvm.org/ce/z/-KCfpa	2022-05-22 11:51:07 -04:00
Sanjay Patel	f0071d43e4	[InstCombine] add use check to fold of bitwise logic with cast ops This was shown as a potential regression in D126040.	2022-05-20 09:08:53 -04:00
Chenbing Zheng	cf348f6a2c	[InstCombine] [NFC] Use a pattern matcher for ExtractElementInst Reviewed By: RKSimon, rampitec Differential Revision: https://reviews.llvm.org/D125857	2022-05-20 10:31:40 +08:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Chenbing Zheng	ffaaf2498b	[InstCombine] (rot X, ?) == 0/-1 --> X == 0/-1 In this patch we add a function foldICmpInstWithConstantAllowUndef to fold integer comparisons with a constant operand: icmp Pred X, C where X is some kind of instruction and C is AllowUndef. We move this fold to the new function, so that it can solve undef elts in a vector. Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D125220	2022-05-19 11:22:26 +08:00
Chenbing Zheng	51df77f36d	[InstCombine] Allow undef vectors when foldSelectToCopysign Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125671	2022-05-19 10:57:49 +08:00
Sanjay Patel	ebbc37391f	[InstCombine] allow variable shift amount in bswap + shift fold When shifting by a byte-multiple: bswap (shl X, Y) --> lshr (bswap X), Y bswap (lshr X, Y) --> shl (bswap X), Y This was limited to constants as a first step in D122010 / `60820e53ec` , but issue #55327 shows a source example (and there's a test based on that here) where a variable shift amount is used in this pattern.	2022-05-18 14:38:16 -04:00
Sanjay Patel	990cc49ca0	[InstCombine] avoid crash on fold of icmp with cast operand We could do better by inserting a bitcast from scalar int to vector int or using an insertelement (the alternate test does not crash because there's an independent fold like that). But this doesn't seem like a likely pattern, so just bail out for now. Fixes issue #55516.	2022-05-18 09:16:30 -04:00
Sanjay Patel	be6d7cc93c	[InstCombine] reduce code duplication for checking types; NFC	2022-05-18 09:16:30 -04:00
Sanjay Patel	dbf3b5f114	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the casts are to a larger element type, the transform reduces shuffle bit width, so that should be a win for most codegen (if not, it can be inverted).	2022-05-17 14:25:11 -04:00
Sanjay Patel	f31d39c42c	[InstCombine] remove cast-of-signbit to shift transform The transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert this and the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen. This is a reduced try of `3794cc0e99` - that was reverted because it could cause infinite loops by conflicting with the related transforms in this block that create shifts.	2022-05-17 11:10:28 -04:00
Nikita Popov	a694546f7c	[KnownBits] Add operator== Checking whether two KnownBits are the same is somewhat common, mainly in test code. I don't think there is a lot of room for confusion with "determine what the KnownBits for an icmp eq would be", as that has a different result type (this is what the eq() method implements, which returns Optional<bool>). Differential Revision: https://reviews.llvm.org/D125692	2022-05-17 09:38:13 +02:00
Sanjay Patel	07d549bce9	Revert "[InstCombine] invert canonicalization for cast of signbit test" This reverts commit `3794cc0e99`. This change is suspected of causing bots to hang at stage 2 compiles, so reverting to confirm and investigate.	2022-05-16 17:47:02 -04:00
Sanjay Patel	3794cc0e99	[InstCombine] invert canonicalization for cast of signbit test The existing transform was wrong in 3 ways: 1. It created an extra instruction when the source and dest types don't match. 2. It did not account for an extra use of the icmp, so could create 2 extra insts. 3. It favored bit hacks over icmp (icmp generally has better analysis). This fixes #54692 (modeled by the PhaseOrdering tests). This is a minimal step to fix the bug, but we should likely invert the sibling transform for the "is negative" pattern too. The backend should be able to invert this back to a shift if that leads to better codegen.	2022-05-16 12:55:52 -04:00
Sanjay Patel	be7f09f7b2	[IR] create and use helper functions that test the signbit; NFCI	2022-05-16 11:26:23 -04:00
Biplob Mishra	ec4adf1f6c	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Differential Revision: https://reviews.llvm.org/D124119	2022-05-16 12:43:33 +01:00
Chenbing Zheng	acbad5086a	[InstCombine] [NFC] separate a function foldICmpBinOpWithConstant There is a long function foldICmpInstWithConstant, we can separate a function foldICmpBinOpWithConstant from it. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125457	2022-05-14 10:54:15 +08:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Nikita Popov	d9ad6a2c8b	[InstCombine] Fix unused variable warning (NFC)	2022-05-13 12:43:21 +02:00
Chenbing Zheng	2a0837aab1	[InstCombine] fix sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) This patch fix bug left in D124503. We should do sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Z,Y)) instead of sub(add(X,Z),umin(Y,Z)) --> add(X,usub.sat(Y,Z)). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D125352	2022-05-13 09:54:10 +08:00
Sanjay Patel	2fa8fc3d0a	[InstCombine] freeze operand in div+mul fold As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to recent changes for urem and sdiv: `d428f09b2c` `99ef341ce9` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the rem, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-12 13:49:29 -04:00
Sanjay Patel	99ef341ce9	[InstCombine] freeze operand in sdiv expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. This is similar to a recent change for urem: `d428f09b2c` There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens. Presumably, in real cases that are similar to the tests where a subsequent transform removes the select, we will also be able to remove the freeze by seeing that the parameter has 'noundef'.	2022-05-11 14:01:28 -04:00
Sanjay Patel	d428f09b2c	[InstCombine] freeze operand in urem expansion As discussed in issue #37809, this transform is not safe if the input is an undefined value. There is no difference in codegen on the basic examples, but this could lead to regressions. We may need to improve freeze analysis or lowering if that happens.	2022-05-11 12:47:26 -04:00
Nikita Popov	6001bfcedc	[InstCombine] Freeze other uses of frozen value If there is a freeze %x, we currently replace all other uses of %x with freeze %x -- as long as they are dominated by the freeze instruction. This patch extends this behavior to cases where we did not originally dominate the use by moving the freeze instruction directly after the definition of the frozen value. The motivation can be seen in test @combine_and_after_freezing_uses: Canonicalizing everything to freeze %x allows folds that are based on value identity (i.e. same operand occurring in two places) to trigger. This also covers the case from D125248. Differential Revision: https://reviews.llvm.org/D125321	2022-05-11 16:47:12 +02:00
Sanjay Patel	0353c2c996	[InstCombine] fold shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This is similar to a recent transform with fneg ( `b331a7ebc1` ), but this is intentionally the most conservative first step to try to avoid regressions in codegen. There are several restrictions that could be removed as follow-up enhancements. Note that a cast with a unary shuffle is currently canonicalized in the other direction (shuffle after cast - D103038 ). We might want to invert that to be consistent with this patch.	2022-05-10 14:20:43 -04:00
Nikita Popov	d222bab672	[InstCombine] Handle GEP scalar/vector base mismatch (PR55363) `30a12f3f63` switched the type check to use the GEP result type rather than the GEP operand type. However, the GEP result types may match even if the operand types don't, in case GEPs with scalar/vector base and vector index are compared. Fixes https://github.com/llvm/llvm-project/issues/55363.	2022-05-10 11:26:43 +02:00
Sanjay Patel	8650f05c97	[InstCombine] fix miscompile when casting int->FP->int As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. I added tests/comments for all of the signed/unsigned combinations at either side of the boundary width, and tried to confirm with Alive2: https://alive2.llvm.org/ce/z/3p9DSu There are already some TODO items in the test file that suggest possible refinements, so the regression with ui->FP->si is probably ok. It seems unlikely that we'd see these kind of edge cases with non-byte-width integer types in real code. The potential miscompile went undetected for several years. This and `747c6a0c73` fixes #55150. Differential Revision: https://reviews.llvm.org/D124692	2022-05-07 08:46:25 -04:00
Serge Pavlov	eb28da89a6	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-07 19:04:11 +07:00
Chenbing Zheng	394c683d40	[InstCombine] sub(add(X,Y),umin(Y,Z)) --> add(X,usub.sat(Y,Z)) Alive2: https://alive2.llvm.org/ce/z/2UNVbp Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D124503	2022-05-07 17:17:48 +08:00
Chenbing Zheng	8eaa1ef0d8	[InstCombine] add casts from splat-a-bit pattern if necessary Splatting a bit of constant-index across a value: sext (ashr (trunc iN X to iM), M-1) to iN --> ashr (shl X, N-M), N-1 If the dest type is different, use a cast (adjust use check). https://alive2.llvm.org/ce/z/acAan3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124590	2022-05-07 15:34:57 +08:00
Sanjay Patel	b331a7ebc1	[InstCombine] canonicalize fneg after shuffle For the unary shuffle pattern, this is opposite to what we try to do with binops, but it seems better to keep it consistent with the motivating binary shuffle pattern. On that, it is clearly better on the usual no-extra uses case. There is a chance that this will pull an fneg away from some other binop and cause a regression in codegen, but that should be invertible in the backend. The transform is birectional: https://alive2.llvm.org/ce/z/kKaKCU https://alive2.llvm.org/ce/z/3Desfw Fixes #45631	2022-05-06 16:30:26 -04:00
Nikita Popov	82190f917a	[InstCombine] Fold icmp of select with implied condition When threading the icmp over the select, check whether the condition can be folded when taking into account the select condition.	2022-05-06 17:13:32 +02:00
Nikita Popov	0863abe3ac	[InstCombine] Fold icmp of select with non-constant operand Try to push an icmp into a select even if the icmp operand isn't constant - perform a generic SimplifyICmpInst instead. This doesn't appear to impact compile-time much, and forming logical and/or is generally profitable, as we have very good support for them.	2022-05-06 16:04:39 +02:00
Nikita Popov	b457ac4240	[InstCombine] Extract icmp of select transform (NFC) To make it either to extend to the case where the other operand is not a constant.	2022-05-06 14:46:44 +02:00
Fraser Cormack	bafab9c09f	[InstCombine] Fix scalable-vector bitwise select matching D113035 enhanced the matching of bitwise selects from vector types. This change unfortunately introduced crashes as it tries to cast scalable vector types to integers. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124997	2022-05-06 12:59:39 +01:00
Chenbing Zheng	4c8c101b49	[InstCombine] try to narrow more shifted bswap-of-zext Try to narrow more bswap, if the shift amount is less than the zext (bswap (zext X)) >> C --> (zext (bswap X)) << C' https://alive2.llvm.org/ce/z/i7ddjn Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124598	2022-05-06 10:45:10 +08:00
Serge Pavlov	e1554ac63a	Revert "[InstCombine] Remove side effect of replaced constrained intrinsics" This reverts commit `83914ee96f`. The change caused discussion: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20220502/1034841.html	2022-05-06 01:09:16 +07:00
Serge Pavlov	83914ee96f	[InstCombine] Remove side effect of replaced constrained intrinsics If a constrained intrinsic call was replaced by some value, it was not removed in some cases. The dangling instruction resulted in useless instructions executed in runtime. It happened because constrained intrinsics usually have side effect, it is used to model the interaction with floating-point environment. In some cases it is correct behavior but often the side effect is actually absent or can be ignored. This change adds specific treatment of constrained intrinsics so that their side effect can be removed if it actually absents. Differential Revision: https://reviews.llvm.org/D118426	2022-05-05 12:02:42 +07:00
Alexander Shaposhnikov	ec7122f64b	[InstCombine] Fold ((A&B)^C)\|B Fold ((A&B)^C)\|B into C\|B. https://alive2.llvm.org/ce/z/zSGSor This addresses the issue https://github.com/llvm/llvm-project/issues/55169 Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D124710	2022-05-05 00:56:20 +00:00
Sanjay Patel	14f257620c	[InstCombine] add type constraint to intrinsic+shuffle fold This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in `432c199e84`. The new test would crash when trying to create a new intrinsic with mismatched types.	2022-05-04 13:07:26 -04:00
Sanjay Patel	7e6d318c50	[InstCombine] move shuffle after funnel shift with same-shuffled operands This extends `432c199e84` and `9c4770eaab` with an intrinsic cited directly in issue #46238 Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 13:07:26 -04:00
Sanjay Patel	15042f44a2	[InstCombine] propagate FMF when reordering intrinsics and shuffles This was missed when extending the fold to allow fma with `9c4770eaab`	2022-05-04 12:10:38 -04:00
Sanjay Patel	9c4770eaab	[InstCombine] move shuffle after fma with same-shuffled operands https://alive2.llvm.org/ce/z/sD-JVv This extends `432c199e84` with a 3 arg intrinsic to demonstrate that the code works with the extra operand. Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.	2022-05-04 11:50:38 -04:00
Sanjay Patel	432c199e84	[InstCombine] move shuffle after min/max with same-shuffled operands This is an intrinsic version of the existing fold for binops. As a first step, I only allowed min/max, but the code is set up to make adding more intrinsics easy (with more or less than 2 arguments). This (and possible follow-ups) are discussed in issue #46238.	2022-05-03 16:23:11 -04:00
Jonas Paulsson	304378fd09	Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." (was `0f8c626`). This reverts commit `14d9390`. The patch previously failed to recognize cases where user had defined a function alias with an identical name as that of the library function. Module::getFunction() would then return nullptr which is what the sanitizer discovered. In this updated version a new function isLibFuncEmittable() has as well been introduced which is now used instead of TLI->has() anytime a library function is to be emitted . It additionally also makes sure there is e.g. no function alias with the same name in the module. Reviewed By: Eli Friedman Differential Revision: https://reviews.llvm.org/D123198	2022-05-02 19:37:00 +02:00
Nikita Popov	95fedfab6c	[InstCombine] Handle non-canonical GEP index in indexed compare fold (PR55228) Normally the index type will already be canonicalized here, but this is not guaranteed depending on visitation order. The code was already accounting for a potentially needed sext, but a trunc may also be needed. Add a ConstantExpr::getSExtOrTrunc() helper method to make this simpler. This matches the corresponding IRBuilder method in behavior. Fixes https://github.com/llvm/llvm-project/issues/55228.	2022-05-02 17:56:01 +02:00
Juneyoung Lee	40a2e35599	[InstCombine] Remove the undef-related workaround code in visitSelectInst This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch. (Added via https://reviews.llvm.org/D35811) The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252. Since the hack is not necessary anymore, this patch removes it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124426	2022-04-30 20:48:42 +09:00
Nikita Popov	1881711fbb	[InstCombine] Remove memset of undef value This removes memset with undef char. We already do this for stores of undef value. This comes with the caveat that this optimization is not, strictly speaking, legal for undef values, because we might be overwriting a poison value. However, our entire load/store model currently still operates on undef values, so we need to support undef here as well for internal consistency. Once https://github.com/llvm/llvm-project/issues/52930 is resolved, these and related folds can be limited to poison -- I've added FIXMEs to that effect. Differential Revision: https://reviews.llvm.org/D124173	2022-04-29 14:51:18 +02:00
Nikita Popov	982cbed819	[InstCombine] Fold logical and/or of range icmps with nowrap flags This is an edge-case where we don't convert to bitwise and/or based on implies poison reasoning, so explicitly try to perform the fold in logical form. The transform itself is poison-safe, as both icmps are based on the same value and any nowrap flags are discarded as part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used example).	2022-04-29 14:42:42 +02:00
Nikita Popov	57aaeefc18	[InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC) Pass the whole instruction rather than unpacking it. This makes it easier to reuse the function in another place, as the entire logic is encapsulated.	2022-04-29 12:46:31 +02:00
Nikita Popov	1f53932a95	[InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold This fold handles a special subset of foldAndOrOfICmpsUsingRanges(), use the more generic implementation instead. The result can differ if a representation using a range comparison is possible, in which case that is preferred over masking. There is a canonicalization opportunity here.	2022-04-29 12:23:00 +02:00
Nikita Popov	5515263e44	[InstCombine] Fold and of two ranges differing by mask This is the de Morgan conjugated variant of the existing fold for ors. Implement this by switching the range code to always work on ors and perform invert operands at the start and end. This makes reasoning easier and makes the extension more obviosuly correct.	2022-04-29 12:01:38 +02:00
Nikita Popov	d5ee20fcc9	[InstCombine] Switch an or of icmps fold to use constant ranges We can express this fold more naturally when working on the constant range implementation. This change is not entirely NFC, because the code now also handles cases that don't match the precise pattern this previously looked for, e.g. we can omit an add on one of the ranges.	2022-04-29 11:15:54 +02:00
Nikita Popov	90dba831ae	[InstCombine] Fold or of icmp ne trunc/and This adds the de Morgan conjugated variant for the existing "and eq" style fold. Proof: https://alive2.llvm.org/ce/z/tkNAcG	2022-04-28 15:07:16 +02:00
Nicolas Abram Lujan	f8a574bf4d	[InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X With the right pre-conditions, we can fold the offset into the shifted constant: https://alive2.llvm.org/ce/z/drMRBU https://alive2.llvm.org/ce/z/cUQv-_ Fixes #55016 Differential Revision: https://reviews.llvm.org/D124369	2022-04-27 14:18:30 -04:00
Roman Lebedev	ffafa71f64	[InstCombine] 'round up integer': if bias is just right, just reuse instructions This is only useful if we can't create new instruction because %x.aligned has other uses and already sticks around.	2022-04-27 17:27:02 +03:00
Roman Lebedev	aac0afd1dd	[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two) But don't deal with non-splats. The test coverage is sufficiently exhaustive, and alive is happy about the changes there. Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X General proof: https://alive2.llvm.org/ce/z/3RjJ5A	2022-04-27 17:26:55 +03:00
Nikita Popov	c103f5e9da	[InstCombine] Combine opaque pointer GEPs with mismatching element types Currently, two GEPs will only be combined if the result element type of one is the same as the source element type of the other. However, this means we may miss folding opportunities where the second GEP could be rewritten using a different element type. This is especially relevant for opaque pointers, where constant GEPs often use i8 element type. Address this by converting GEP indices to offsets, adding them, and then converting them back to indices. The first (inner) GEP is allowed to have variable indices as well, in which case only the constant suffix is converted into an offset. This should address the regression reported in https://reviews.llvm.org/D123300#3467615. Differential Revision: https://reviews.llvm.org/D124459	2022-04-27 09:33:47 +02:00
Ricky Zhou	4041c44853	[InstCombine] Update predicate when canonicalizing comparisons in canonicalizeClampLike. canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge, respectively. However, it does not update the variable holding the comparison predicate type after doing this. Later code fails to handle the non-canonical predicate type (specifically, the swap of ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized from ugt to uge). This leads to the miscompile reported in PR53252. Fix this by updating the comparison predicate after canonicalizing. Fixes #53252 Differential Revision: https://reviews.llvm.org/D119690	2022-04-26 17:35:45 -04:00
Sanjay Patel	903aa5e0f8	[InstCombine] try to fold icmp with mismatched extended operands If a value is known to be non-negative and zexted, that's the same thing as sexted. So for the purpose of looking past the casts with an icmp, treat it as if it was a sext: https://alive2.llvm.org/ce/z/_BDsGV This is necessary, but not enough to solve the motivating problem: https://github.com/llvm/llvm-project/issues/55013 Differential Revision: https://reviews.llvm.org/D124419	2022-04-26 14:26:36 -04:00
Sanjay Patel	c8ed784ee6	[InstCombine] fold freeze of partial undef/poison vector constants We can always replace the undef elements in a vector constant with regular constants to get rid of the freeze: https://alive2.llvm.org/ce/z/nfRb4F The select diffs show that we might do better by adjusting the logic for a frozen select condition. We may also want to refine the vector constant replacement to consider forming a splat. Differential Revision: https://reviews.llvm.org/D123962	2022-04-26 14:16:11 -04:00
Sanjay Patel	6631907ad2	[InstCombine] use isKnownNonNegative to reduce code duplication; NFC We may be able to make the ValueTracking wrapper smarter in the future (for example, analyze a simple recurrence), so this will automatically benefit if that happens.	2022-04-25 17:13:29 -04:00
Nikita Popov	e8945110d2	[InstCombine] Remove redundant unsigned underflow fold (NFCI) This is now handled as a combination of two other folds: (A+B) <= A & (A+B) != 0 --> (A+B)-1 < A (A+B)-1 < A --> -B < A	2022-04-25 14:22:43 +02:00
Nikita Popov	ee50925894	[InstCombine] Fold (X != 0) & (Y u>= X) This adds the De Morgan conjugated fold for the existing (X == 0) \| (Y u< X) fold. Proof: https://alive2.llvm.org/ce/z/3Me3JQ	2022-04-25 13:16:47 +02:00
Nikita Popov	2bec8d6d59	[InstCombine] Fold X + Y + C u< X This is a variation on the X + Y u< X fold with an extra constant. Proof: https://alive2.llvm.org/ce/z/VNb8pY	2022-04-25 12:53:39 +02:00
Chenbing Zheng	5805cfb901	[InstCombine] Complete folding of fneg-of-fabs This patch add a function foldSelectWithFCmpToFabs, and do more combine for fneg-of-fabs. With 'nsz': fold (X < +/-0.0) ? X : -X or (X <= +/-0.0) ? X : -X to -fabs(x) fold (X > +/-0.0) ? X : -X or (X >= +/-0.0) ? X : -X to -fabs(x) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123830	2022-04-25 09:53:36 +08:00
Simon Pilgrim	ffe13960b5	[InstCombine] Fold (A & 2^C1) + A => A & (2^C1 - 1) iff bit C1 in A is a sign bit (PR21929) Alive2: https://alive2.llvm.org/ce/z/Ygq26C This is the final missing fold to handle the modulo2 simplification: https://github.com/llvm/llvm-project/issues/22303 Fixes #22303 Differential Revision: https://reviews.llvm.org/D123374	2022-04-22 16:59:02 +01:00
Nikita Popov	369ef9bf60	[InstCombine] Extract code for or of icmp eq zero and icmp fold (NFC) To make it easier to extend this to the congruent and case.	2022-04-22 16:48:59 +02:00
Nikita Popov	ba46ae7bd8	[InstCombine] Merge foldAndOfICmps() and foldOrOfICmps() (NFCI) Folds are supposed to always be added in conjugated pairs for and and or. Merge the two functions to make folds for which this is currently not the case more obvious.	2022-04-22 12:48:03 +02:00
Nikita Popov	3e1d2c352c	[InstCombine] Fix or of commuted foldable predicates `1d90e53044` switch this code to store the predicates and operands in variables, but retained a swapOperands() call here. Thus the commuted cases were no longer folded. Additionally, as the change was not reported, the next InstCombine iteration would not pick it up either.	2022-04-22 12:31:26 +02:00
Sanjay Patel	664ae7bbcc	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X (2nd try) The first attempt at this missed a check to make sure the offset constant was in range and caused many bot failures. That was missed in the Alive2 proof because on overshift creates poison rather than the assert from APInt. Here's an alternate attempt at a proof using count-trailing-zeros: https://alive2.llvm.org/ce/z/pnXQYR Original commit message: This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99	2022-04-21 16:18:46 -04:00
chenglin.bi	25aba1abb5	Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)" This reverts commit `b543d28df7`.	2022-04-22 00:56:20 +08:00
chenglin.bi	b543d28df7	[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124183	2022-04-22 00:32:36 +08:00
Sanjay Patel	8960ba7491	Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" This reverts commit `5819f4a422`. This caused bots to fail with a crash/assert during the fold, so some constraint was missed.	2022-04-21 12:15:27 -04:00
Sanjay Patel	5819f4a422	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99 Fixes #54890	2022-04-21 11:38:27 -04:00
Nikita Popov	46c2b41d02	[InstCombine] Remove dead code (NFC) This was a leftover condition without code.	2022-04-21 15:53:53 +02:00
Craig Topper	e3f6c2d288	[InstCombine] Don't look through bitcast from vector in collectInsertionElements. We're making a recursive call here and everything in the function assumes we're looking at scalars. This would be violated if we looked through a bitcast from vectors. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124015	2022-04-20 09:15:32 -07:00
chenglin.bi	1fae4b492d	[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor if c is divisible by (1 << ShAmtC), we can fold this pattern: lshr (mul nuw x, c), ShAmtC -> mul nuw x, (c >> ShAmtC) https://alive2.llvm.org/ce/z/ox4wAt Fix https://github.com/llvm/llvm-project/issues/54824 Reviewed By: spatel, lebedev.ri, craig.topper Differential Revision: https://reviews.llvm.org/D123453	2022-04-21 00:13:36 +08:00
Sanjay Patel	bf09a925f2	[InstCombine] remove likely redundant ValueTracking-based folds for shifts This is not expected to have a functional difference as discussed in the post-commit comments for `8a9c70fc01`. All of the motivating tests for the older fold still optimize as expected because other code can infer the 'nuw'.	2022-04-20 11:28:31 -04:00
Sanjay Patel	8a9c70fc01	[InstCombine] C0 shift (X add nuw C) --> (C0 shift C) shift X With 'nuw' we can convert the increment of the shift amount into a pre-shift (constant fold) of the shifted constant: https://alive2.llvm.org/ce/z/FkTyR2 Fixes issue #41976	2022-04-19 15:21:34 -04:00
Sanjay Patel	3a27b51b27	[InstCombine] reduce code for freeze of undef The description was ambiguous about the behavior when boths select arms are constant or both arms are not constant. I don't think there's any evidence to support either way, but this matches the code with a more specified description. We can extend this to deal with vector constants with undef/poison elements. Currently, those don't get folded anywhere.	2022-04-18 15:14:02 -04:00
Sanjay Patel	2c2568f39e	[InstCombine] canonicalize select with signbit test This is part of solving issue #54750 - in that example we have both forms of the compare and do not recognize the equivalence.	2022-04-14 14:28:47 -04:00
Liqin Weng	fa4b4f0fcb	[InstCombine] fold more constant remainder to select-of-constants remainder Reviewed By: xbolva00, spatel, Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D123486	2022-04-12 09:40:56 +08:00
Alexander Shaposhnikov	f6bb156fb1	[InstCombine] Fold icmp(X) ? f(X) : C This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate. This addresses the issue https://github.com/llvm/llvm-project/issues/54089. Differential revision: https://reviews.llvm.org/D123159 Test plan: make check-all	2022-04-12 01:32:55 +00:00
Sanjay Patel	1206a18d41	[InstCombine] guard against splat-mul corner case The test is already simplified, and I'm not sure how to write a test to exercise the new clause. But it protects the 2-bit pattern from miscompiling as noted in D123453. https://alive2.llvm.org/ce/z/QPyVfv (If we managed to fall into the mul transform, it would wrongly create a zero on this pattern.)	2022-04-11 15:50:13 -04:00
Sanjay Patel	7783db55af	[InstCombine] try to fold low-mask of ashr to lshr With one-use, we handle this via demanded-bits. But We need to handle extra uses to improve issue #54750. https://alive2.llvm.org/ce/z/aDYkPv	2022-04-11 11:56:40 -04:00
Simon Pilgrim	431e93f4f5	[InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280) As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands Alive2: https://alive2.llvm.org/ce/z/W_KXfy Differential Revision: https://reviews.llvm.org/D123399	2022-04-11 11:32:56 +01:00
serge-sans-paille	aa15ea47e2	[builtin_object_size] Basic support for posix_memalign It actually implements support for seeing through loads, using alias analysis to refine the result. This is rather limited, but I didn't want to rely on more than available analysis at that point (to be gentle with compilation time), and it does seem to catch common scenario, as showcased by the included tests. Differential Revision: https://reviews.llvm.org/D122431	2022-04-08 09:31:11 +02:00
Chenbing Zheng	467cbb6249	[InstCombine] fold more constant divisor to select-of-constants divisor By adding a parameter to function FoldOpIntoSelect， we can fold more Ops to Select. For this example, we tend to fold the division instruction, so we no longer care whether SelectInst is one use. This patch slove TODO left in InstCombine/div.ll. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122967	2022-04-08 10:19:24 +08:00
Augie Fackler	f3c702fbd1	InstCombineCalls: fix annotateAnyAllocCallSite to report changes Spotted during review of D123052. Differential Revision: https://reviews.llvm.org/D123232	2022-04-07 13:49:09 -04:00
Augie Fackler	f120be6c86	InstCombineCalls: when adding an align attribute, never reduce it Sometimes we can infer an align from an allocalign but the function already promised it'd be more-aligned than the allocalign and there's an existing align that we shouldn't reduce. Make sure we handle that correctly. Differential Revision: https://reviews.llvm.org/D121642	2022-04-07 12:38:44 -04:00
Augie Fackler	ca051a46fb	InstCombineCalls: infer return alignment from allocalign attributes This exposes a couple of lingering bugs, which will be fixed in the next two commits. Differential Revision: https://reviews.llvm.org/D123052	2022-04-07 12:38:44 -04:00
Simon Pilgrim	afa1ae9e0c	[InstCombine] SimplifyDemandedUseBits - allow and(srem(X,Pow2),C) -> and(X,C) to work on vector types Replace m_ConstantInt with m_APInt to match uniform (no-undef) vector remainder amounts.	2022-04-07 15:24:45 +01:00
Simon Pilgrim	5909c67883	[InstCombine] SimplifyDemandedUseBits - add TODO to remove shl node if we only demand known sign bits of the shift source Similar to what we already perform for ashr/lshr	2022-04-07 14:35:11 +01:00
Simon Pilgrim	5e90224839	[InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit This is a lshr equivalent to D122340 - if we don't demand any of the additional sign bits introduced by the ashr, the lshr can be treated as an ashr and we can remove the shift entirely if we only demand already known sign bits. Another step towards PR21929 https://alive2.llvm.org/ce/z/6f3kjq Differential Revision: https://reviews.llvm.org/D123118	2022-04-07 14:33:31 +01:00
Matt Devereau	2c3f66519c	[SVE] Extend support for folding select + masked gathers Extend the work done in D106376 to include masked gathers Differential Revision: https://reviews.llvm.org/D122896	2022-04-05 16:27:11 +00:00
Alexander Shaposhnikov	6cf10b7e6e	[InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases srem(X, PowerOf2) == C and srem(X, PowerOf2) != C for positive C. This addresses the issue https://github.com/llvm/llvm-project/issues/54650 Differential revision: https://reviews.llvm.org/D122942 Test plan: make check-all	2022-04-03 03:57:05 +00:00
Sanjay Patel	5f8c2b884d	[InstCombine] limit icmp fold with sub if other sub user is a phi This is a hacky fix for: https://github.com/llvm/llvm-project/issues/54558 As discussed there, codegen regressed when we opened up this transform to allow extra uses ( `61580d0949` ), and it's not clear how to undo the transforms at the later stage of compilation. As noted in the code comments, there's a set of remaining folds that are still limited to one-use, so we can try harder to refine and expand the limitations on these folds, but it's likely to be an up-and-down battle as we find and overcome similar regressions. Differential Revision: https://reviews.llvm.org/D122909	2022-04-02 19:23:42 -04:00
Sanjay Patel	97ac0cd6c4	[InstCombine] fold fcmp with lossy casted constant (2nd try) This is a retry of `9397bdc67e` - that was reverted until we had a clang warning in place to alert users about a possible mistake in source. The warning was added with `ab982eace6`. This is noted as a missing clang warning in #54222, but it is also a missing optimization opportunity. Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-04-02 19:23:01 -04:00
Roman Lebedev	308ca349cb	[InstCombine] Fold `(X \| C2) ^ C1 --> (X & ~C2) ^ (C1^C2)` These two are equivalent, and i think the `and` form is more-ish canonical. General proof: https://alive2.llvm.org/ce/z/RrF5s6 If constant on the (outer) `xor` is an `undef`, the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2 However, if the constant on the (inner) `or` is an `undef`, we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7 I guess, producing a zero `and`-mask is optimal in that case. alive-tv is happy about the entirety of `xor-of-or.ll`.	2022-04-03 00:12:56 +03:00
Hirochika Matsumoto	a3cffc1150	[InstCombine] Fold (ctpop(X) == 1) \| (X == 0) into ctpop(X) < 2 https://alive2.llvm.org/ce/z/94yRMN Fixes #54177 Differential Revision: https://reviews.llvm.org/D122077	2022-03-29 11:30:06 -04:00
Nikita Popov	682ef39b1a	[InstCombine] Remove call to getPointerElementType() This was erroneously re-introduced as part of `bb0b23174e`.	2022-03-29 16:52:29 +02:00
Johannes Doerfert	bb0b23174e	[InstCombineCalls] Optimize call of bitcast even w/ parameter attributes Before we gave up if a call through bitcast had parameter attributes. Interestingly, we allowed attributes for the return value already. We now handle both the same way, namely, we drop the ones that are incompatible with the new type and keep the rest. This cannot cause "more UB" than initially present. Differential Revision: https://reviews.llvm.org/D119967	2022-03-28 20:57:52 -05:00
chenglin.bi	9a53793ab8	[InstCombine] Fold two select patterns into and-or select (~a \| c), a, b -> and a, (or c, b) https://alive2.llvm.org/ce/z/bnDobs select (~c & b), a, b -> and b, (or a, c) https://alive2.llvm.org/ce/z/k2jJHJ Differential Revision: https://reviews.llvm.org/D122152	2022-03-28 16:07:55 -04:00
Simon Pilgrim	6a094a6264	[InstCombine] SimplifyDemandedUseBits - remove ashr node if we only demand known sign bits We already do this for SelectionDAG, but we're missing it here. Noticed while re-triaging PR21929 Differential Revision: https://reviews.llvm.org/D122340	2022-03-25 15:39:08 +00:00
Sanjay Patel	5dbb53b1b4	[InstCombine] merge shuffled vector negate and multiply Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops. We need a little hack to make the existing logic work because it does not expect to move constants from op0 to op1, but the code comment hopefully makes that clear. I don't think there are any other identities like that. Fixes #54364 Differential Revision: https://reviews.llvm.org/D122390	2022-03-24 10:25:16 -04:00
Dávid Bolvanský	4397504c2d	[NFCI] Fix set-but-unused warning in InstCombineAddSub.cpp	2022-03-24 08:33:40 +01:00
chenglin.bi	52f323d0f1	[InstCombine] Fold abs of known negative operand when source is sub When abs source comes from (x - y), check if a "x > y" dominating condition exists. Fixes #54132 Differential Revision: https://reviews.llvm.org/D122013	2022-03-23 15:21:33 -04:00
Sanjay Patel	0fcff69bcb	[InstCombine] try to narrow shifted bswap-of-zext (2nd try) The first attempt at this missed a validity check. This version includes a test of the narrow source type for modulo-16-bits. Original commit message: This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-23 11:28:37 -04:00
Nathan Chancellor	4e0008dcbe	Revert "[InstCombine] try to narrow shifted bswap-of-zext" This reverts commit `9e9bda2e8f`. This causes a backend error when building the Linux kernel for arm64. See https://reviews.llvm.org/D122166 for a simplified reproducer.	2022-03-22 17:32:33 -07:00
Philip Reames	7abefc4222	[instcombine] Fold away memset/memmove from otherwise unused alloca The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code. As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.	2022-03-22 13:48:48 -07:00
Sanjay Patel	ccf8c969c2	[InstCombine] reorder code, fix formatting; NFC The affected code can be updated to solve #54364, so make some cosmetic diffs before real changes.	2022-03-22 16:33:01 -04:00
Sanjay Patel	60820e53ec	[InstCombine] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is an IR implementation of a transform suggested in D120648. The "swaps cancel" test models the motivating optimization from that proposal. Alive2 checks (as noted in the other review, we could use knownbits to handle shift-by-variable-amount, but that can be an enhancement patch): https://alive2.llvm.org/ce/z/pXUaRf https://alive2.llvm.org/ce/z/ZnaMLf Differential Revision: https://reviews.llvm.org/D122010	2022-03-22 09:10:55 -04:00
Sanjay Patel	9e9bda2e8f	[InstCombine] try to narrow shifted bswap-of-zext This is the IR counterpart to `370ebc9d9a` which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166	2022-03-22 08:22:30 -04:00
Nikita Popov	fc8946fae7	[InstCombine] Remove integer SPF of SPF folds (NFCI) Now that we canonicalize to intrinsics, these folds should no longer be needed. Only one fold that also applies to floating-point min/max is retained.	2022-03-18 10:20:48 +01:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
Nikita Popov	4010a7a5d0	Reapply [InstCombine] Support switch in phi to cond fold Reapply with an explicit check for multi-edges, as the expected behavior of multi-edge dominance is unclear (D120811). ----- For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-17 10:03:09 +01:00
Sanjay Patel	598721f866	[InstCombine] try harder to propagate 'nsz' through fneg-of-select This can be viewed as swapping the select arms: https://alive2.llvm.org/ce/z/jUvFMJ ...so we don't have the 'nsz' problem with the more general fold. This unlocks other folds for the motivating fabs example. This was discussed in issue #38828.	2022-03-15 11:05:29 -04:00
Simon Pilgrim	7e4cf582cf	[InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold A further extension for Issue #32161 For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly. https://alive2.llvm.org/ce/z/cdodNQ The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To Differential Revision: https://reviews.llvm.org/D121551	2022-03-15 14:17:38 +00:00
Craig Topper	ce78e68261	[InstCombine] Fold select based logic of fcmps with same operands when FMF is present. If we have a logical and/or in select form and the true/false operand is an fcmp with poison generating FMF, we won't be able to fold it to an and/or instruction. This prevents us from optimizing the case where it is a logical operation of two fcmps with identical operands. This patch adds explicit checks for this case that doesn't rely on converting to and/or to do the optimization. It reuses the existing foldLogicOfFCmps, but adds a new flag to disable the other combine that is inside that function. FMF flags from the two FCmps are intersected using the logic added in D121243. The FIXME has been updated to indicate that we can only use a union for the non-select form. This allows us to optimize cases like this from compare-fp-3.c in the gcc torture suite with fast math. void test1 (float x, float y) { if ((x==y) && (x!=y)) link_error0(); } Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D121323	2022-03-14 14:45:07 -07:00
Sanjay Patel	3491f2f4b0	[InstCombine] replace negated operand in fcmp with 0.0 X (any pred) -X --> X (any pred) 0.0 This works with all FP values and preserves FMF. Alive2 examples: https://alive2.llvm.org/ce/z/dj6jhp This can also create one of the patterns that we match as "fabs" as shown in one of the test diffs.	2022-03-10 12:53:32 -05:00
Sanjay Patel	9fac110bf7	Revert "[InstCombine] fold fcmp with lossy casted constant" This reverts commit `9397bdc67e`. This optimization is likely to surprise programmers as seen in post-commit comments, so we should add a clang warning first (that is proposed in D121306).	2022-03-10 10:22:22 -05:00
Simon Pilgrim	808d9d260b	[InstCombine] Add vector support to icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold As discussed on Issue #32161 this fold can be generalized a lot more than it currently is, but this patch at least adds vector support. Differential Revision: https://reviews.llvm.org/D121358	2022-03-10 13:30:48 +00:00
Craig Topper	f72fe2ef67	[InstCombine] Preserve FMF in foldLogicOfFCmps. This patch intersects the fast math flags from the two fcmps instead of dropping them. I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed to check out. With the other flags it told me "Couldn't prove the correctness of the transformation". Not sure if I should just preserve nnan and ninf? Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D121243	2022-03-09 09:17:09 -08:00
Sanjay Patel	9397bdc67e	[InstCombine] fold fcmp with lossy casted constant This is noted as a missing clang warning in #54222 (and we should still make that enhancement). Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-03-08 12:41:12 -05:00
Arnold Schwaighofer	dcdc1f29bb	InstCombine: Can't fold a phi arg load into the phi if the load is from a swifterror address `swifterror` addresses are only allowed as operands to load, store, and calls. The following transformation is not allowed. It would create a phi with a `swifterror` address operand. ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: %val1 = load i8, i8* %addr br exit bb2: %val2 = load i8, i8* %addr br exit exit: %val = phi [%val1, %bb1] [%val2, %bb2] ``` => ``` %addr = alloca swifterror i8* br %cond, label %bb1, label %b22 bb1: br exit bb2: br exit exit: %val_addr = phi [%addr, %bb1] [%addr, %bb2] %val2 = load i8, i8* %val_addr ``` rdar://89865485 Differential Revision: https://reviews.llvm.org/D121217	2022-03-08 09:09:51 -08:00
Augie Fackler	5e4c75db3b	InstructionCombining: avoid eliding mismatched alloc/free pairs Prior to this change LLVM would happily elide a call to any allocation function and a call to any free function operating on the same unused pointer. This can cause problems in some obscure cases, for example if the body of operator::new can be inlined but the body of operator::delete can't, as in this example from jyknight: #include <stdlib.h> #include <stdio.h> int allocs = 0; void operator new(size_t n) { allocs++; void mem = malloc(n); if (!mem) abort(); return mem; } __attribute__((noinline)) void operator delete(void mem) noexcept { allocs--; free(mem); } void deleteit(inti) { delete i; } int main() { int*i = new int; deleteit(i); if (allocs != 0) printf("MEMORY LEAK! allocs: %d\n", allocs); } This patch addresses the issue by introducing the concept of an allocator function family and uses it to make sure that alloc/free function pairs are only removed if they're in the same family. Differential Revision: https://reviews.llvm.org/D117356	2022-03-04 10:41:10 -05:00
Craig Topper	608161225e	[InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC The similar getICmpCode and getPredForICmpCode are already there. This moves FP for consistency. I think InstCombine is currently the only user of both. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120754	2022-03-03 09:33:24 -08:00
Nikita Popov	c1b9667148	[InstCombine] Support opaque pointers in callee bitcast fold To make this actually trigger, we also need to check whether the function types differ, which is a hidden cast under opaque pointers. The transform is somewhat less relevant there because it is primarily about pointer bitcasts, but it can also happen with other bit- or pointer-castable types. Byval handling is easier with opaque pointers because there is no need to adjust the byval type, we only need to make sure that it's still a pointer.	2022-03-03 11:07:39 +01:00
Nikita Popov	6c8adc5054	[InstCombine] Remove unnecessary byval check in callee cast fold The logic for handling this was fixed in `8d7f118ab2`, but the check for byval on the callee was retained. This resulted in a weird situation where the transform would work depending on whether the byval was only on the call or on both the call and the function.	2022-03-03 10:55:14 +01:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
Nikita Popov	61580d0949	Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold This is a recommit without changes. I originally reverted this due to a significant code-size regression on tramp3d-v4, however further investigation showed that in the tramp3d-v4 case this change enables additional optimizations (in particular more jump threading), which happens to reduce the size of a function just enough to be eligible for inlining at hot callsites, which results in the code size increase. As such, this was just bad luck. ----- This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-03-02 16:43:33 +01:00
Nikita Popov	5cf06d10f8	Revert "[InstCombine] Support switch in phi to cond fold" This reverts commit `0817ce86b5`. Seeing some ppc64le stage2 failures, reverting to investigate.	2022-03-02 12:49:47 +01:00
Nikita Popov	0817ce86b5	[InstCombine] Support switch in phi to cond fold For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-02 12:16:32 +01:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	7bc6667845	[Analysis] Simplify the interface to llvm::getICmpCode. NFC Instead of passing an InstCmpInt * and a bool just pass the predicate from the caller. I'm considering moving the similar FCmp functions from InstCombine over here and this makes the interface consistent with what is used for FCmp. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120609	2022-03-01 09:53:27 -08:00
Nikita Popov	a1f442b278	[InstCombine] Support phi to cond fold with more than two preds This transform can still be applied if there are more than two phi inputs, as long as phi inputs with the same value are dominated by the same idom edge.	2022-03-01 16:31:49 +01:00

... 2 3 4 5 6 ...

5134 Commits