llvm-project

Commit Graph

Author	SHA1	Message	Date
Juneyoung Lee	c038845f58	[InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified. Resolves llvm.org/pr48975. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D96663	2021-06-21 17:39:05 +09:00
hyeongyukim	69b0ed9a0a	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-06-17 19:46:17 +09:00
Nathan Chancellor	e6b086bef2	Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)" This reverts commit `4f2fd3818b`. The Linux kernel fails to build after this commit. See https://reviews.llvm.org/D99481 for a reproducer. Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2021-05-31 20:21:26 -07:00
Hyeongyu Kim	4f2fd3818b	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-05-31 14:08:20 +09:00
Nikita Popov	9a9421a461	Reapply [InstCombine] Fold multiuse shr eq zero This was reverted due to performance regressions in ARM benchmarks, which have since been addressed by D101196 (SCEV analysis improvement) and D101778 (CGP reverse transform). ----- The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-05-22 14:46:50 +02:00
Sanjay Patel	a6f79b5671	[InstCombine] avoid infinite loops with select/icmp transforms This fixes https://llvm.org/PR48900 , but as seen in the regression tests prevents some optimizations. There are a few options to restore those (switch to min/max intrinsics, add larger pattern matching for select with dominating condition, improve CVP), but we need to prevent the bug 1st.	2021-05-04 11:54:06 -04:00
Nikita Popov	24e9fbc1a3	Revert "[InstCombine] Fold multiuse shr eq zero" This reverts commit `9423f78240`. A performance regression with this patch has been reported at https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.	2021-04-21 21:40:52 +02:00
Reid Kleckner	91f7a4fff7	Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)" This reverts commit `13ec913bdf`. This commit introduces new uses of the overflow checking intrinsics that depend on implementations in compiler-rt, which Windows users generally do not link against. I filed an issue (somewhere) to make clang auto-link the builtins library to resolve this situation, but until that happens, it isn't reasonable for the optimizer to introduce new link time dependencies.	2021-04-20 15:53:34 -07:00
Roman Lebedev	13ec913bdf	[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769) We already had support for it's unsigned variant, so simply extend it to also handle the signed variant. Fixes https://bugs.llvm.org/show_bug.cgi?id=48769	2021-04-20 21:29:43 +03:00
Nikita Popov	9423f78240	[InstCombine] Fold multiuse shr eq zero The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-04-19 22:13:11 +02:00
Mehrnoosh Heidarpour	29f189f90d	[InstCombine] Conditionally emit nowrap flags when combining two adds Currently, the InstCombineCompare is combining two add operations into a single add operation which always has a nsw flag, without checking the conditions to see if this flag should be present according to the original two add operations or not. This patch will change the InstCombineCompare to emit the nsw or nuw only when these flags are allowed to be generated according to the original add operations and remove the possibility of applying wrong optimization with passes that will perform on the IR later in the pipeline. To confirm that the current results are buggy and the results after proposed patch are the correct IR the following examples from Alive2 are attached; the same results can be seen in the case of nuw flag and nsw is just used as an example. The following link shows that the generated IR with current LLVM is a buggy IR when none of the original add operations have nsw flag. https://alive2.llvm.org/ce/z/WGaDrm The following link proves that the generated IR after the patch in the former case is the correct IR. https://alive2.llvm.org/ce/z/wQ7G_e Differential Revision: https://reviews.llvm.org/D100095	2021-04-14 20:53:06 +02:00
Sanjay Patel	5354a213a0	[InstCombine] fold shift+trunc signbit check https://alive2.llvm.org/ce/z/6vQvrP This solves: https://llvm.org/PR49866	2021-04-12 16:19:43 -04:00
Sanjay Patel	85294703a7	[InstCombine] fold fcmp-of-copysign idiom As discussed in: https://llvm.org/PR49179 ...this pattern shows up in library code. There are several potential generalizations as noted, but we need to be careful that we get FP special-values right, and it's not clear how much variation we should expect to see from this exact idiom.	2021-02-17 10:32:33 -05:00
Roman Lebedev	4ed0d8f2f0	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Sanjay Patel	288f3fc5df	[InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test This is a more basic pattern that we should handle before trying to solve: https://llvm.org/PR48640 There might be a better way to think about this because the pre-condition that I came up with (number of sign bits in the compare constant) misses a potential transform for each of ugt and ult as commented on in the test file. Tried to model this is in Alive: https://rise4fun.com/Alive/juX1 ...but I couldn't get the ComputeNumSignBits() pre-condition to work as expected, so replaced with leading 0/1 preconditions instead. Name: ugt Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ugt i8 %a, C2 => %r = icmp slt i8 %x, 0 Name: ult Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ult i4 %a, C2 => %r = icmp sgt i4 %x, -1 Also approximated in Alive2: https://alive2.llvm.org/ce/z/u5hCcz https://alive2.llvm.org/ce/z/__szVL Differential Revision: https://reviews.llvm.org/D94014	2021-01-11 15:53:39 -05:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Juneyoung Lee	29f8628d1f	[Constant] Add containsPoisonElement This patch - Adds containsPoisonElement that checks existence of poison in constant vector elements, - Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved. Thanks! Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94053	2021-01-06 12:10:33 +09:00
Simon Pilgrim	313d982df6	[IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse.	2021-01-05 11:01:10 +00:00
Simon Pilgrim	89abe1cf83	[InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI. Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.	2020-12-24 14:22:26 +00:00
Jun Ma	e12f584578	[InstCombine] Remove scalable vector restriction in InstCombineCompares Differential Revision: https://reviews.llvm.org/D93269	2020-12-15 20:36:57 +08:00
LemonBoy	42732d33cc	[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated. This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D89628	2020-11-09 14:41:07 +03:00
Roman Lebedev	8d0fdd36a3	[IR] CmpInst: Add getFlippedSignednessPredicate() And refactor a few places to use it	2020-11-06 11:31:09 +03:00
Sanjay Patel	5a6e66ec72	[InstCombine] add folds for icmp+ctpop https://alive2.llvm.org/ce/z/XjFPQJ define void @src(i64 %value) { %t0 = call i64 @llvm.ctpop.i64(i64 %value) %gt = icmp ugt i64 %t0, 63 %lt = icmp ult i64 %t0, 64 call void @use(i1 %gt, i1 %lt) ret void } define void @tgt(i64 %value) { %eq = icmp eq i64 %value, -1 %ne = icmp ne i64 %value, -1 call void @use(i1 %eq, i1 %ne) ret void } declare i64 @llvm.ctpop.i64(i64) #1 declare void @use(i1, i1)	2020-10-26 16:48:56 -04:00
Sanjay Patel	437d7551c5	[InstCombine] reduce code duplication in icmp intrinsic folds; NFC	2020-10-26 16:48:56 -04:00
Caroline Concatto	2415636475	[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703	2020-10-23 09:15:17 +01:00
Simon Pilgrim	17b9a91ec2	[InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI. We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-06 14:48:34 +01:00
Simon Pilgrim	567049f892	[InstCombine] Use m_FAbs matcher helper. NFCI.	2020-10-01 14:42:34 +01:00
Huihui Zhang	9ad6049736	[InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant. We cannot iterate on scalable vector, the number of elements is unknown at compile-time. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87918	2020-09-18 11:26:36 -07:00
Nikita Popov	f6b87da0c7	[InstCombine] Fold comparison of abs with int min If the abs is poisoning, this is already folded to true/false. For non-poisoning abs, we can convert this to a comparison with the operand.	2020-09-08 20:23:03 +02:00
Sanjay Patel	7a6d6f0f70	[InstCombine] improve folds for icmp with multiply operands (PR47432) Check for no overflow along with an odd constant before we lose information by converting to bitwise logic. https://rise4fun.com/Alive/2Xl Pre: C1 != 0 %mx = mul nsw i8 %x, C1 %my = mul nsw i8 %y, C1 %r = icmp eq i8 %mx, %my => %r = icmp eq i8 %x, %y Name: nuw ne Pre: C1 != 0 %mx = mul nuw i8 %x, C1 %my = mul nuw i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y Name: odd ne Pre: C1 % 2 != 0 %mx = mul i8 %x, C1 %my = mul i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y	2020-09-07 12:40:37 -04:00
Nikita Popov	ada8a17d94	[InstCombine] Fold abs intrinsic eq zero Following the same transform for the select version of abs.	2020-09-05 15:11:38 +02:00
Christopher Tetreault	640f20b0c7	[SVE] Remove calls to VectorType::getNumElements from InstCombine Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D82237	2020-08-31 12:59:10 -07:00
Roman Lebedev	e65f213178	[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Benjamin Kramer	b98e25b6d7	Make helpers static. NFC.	2020-08-19 16:00:03 +02:00
Roman Lebedev	a512c89476	[NFC][InstCombine] Refactor '(-NSW x) pred x' fold	2020-08-06 11:50:36 +03:00
Roman Lebedev	141357663e	[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480) Name: (-x) u<= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ule i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/V22 https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	132be1f502	[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480) Name: (-x) u< x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ult i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/zSuf https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	0e1241a3c9	[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480) Name: (-x) u>= x --> x s>= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp uge i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/LLHd https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	16c642fa39	[InstCombine] (-NSW x) u> x --> x s> 0 (PR39480) Name: (-x) u> x --> x s> 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ugt i8 %neg_x, %x => %r = icmp sgt i8 %x, 0 https://rise4fun.com/Alive/Raea https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	59387c0dd7	[InstCombine] (-NSW x) s<= x --> x s>= 0 (PR39480) Name: (-x) s<= x --> x >= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sle i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/91k https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	01a6c4bd26	[InstCombine] (-NSW x) s< x --> x s> 0 (PR39480) Name: (-x) s< x --> x > 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp slt i8 %neg_x, %x => %r = icmp sgt i8 %x, 0 https://rise4fun.com/Alive/3IXb https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	3885207651	[InstCombine] (-NSW x) s>= x --> x s<= 0 (PR39480) Name: (-x) s>= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sge i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/Hdip https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	8878b79cfe	[InstCombine] (-NSW x) ==/!= x --> x ==/!= 0 (PR39480) Name: (-x) == x --> x == 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp eq i8 %neg_x, %x => %r = icmp eq i8 %x, 0 Name: (-x) != x --> x != 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ne i8 %neg_x, %x => %r = icmp ne i8 %x, 0 https://rise4fun.com/Alive/4slH https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	5060f5682b	[InstCombine] (-NSW x) s> x --> x s< 0 (PR39480) Name: (-x) s> x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sgt i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/ZslD https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Sanjay Patel	c66169136f	[InstCombine] fold icmp with 'mul nsw/nuw' and constant operands This also removes a more specific fold that only handled icmp with 0. https://rise4fun.com/Alive/sdM9 Name: mul nsw with icmp eq Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 / C1 Name: mul nuw with icmp eq Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 /u C1 Name: mul nsw with icmp ne Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 / C1 Name: mul nuw with icmp ne Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 /u C1	2020-08-05 17:29:32 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sanjay Patel	3b8ae1001f	[InstCombine] fix miscompile from umul_with_overflow matching As noted in PR46561: https://bugs.llvm.org/show_bug.cgi?id=46561 ...it takes something beyond a minimal IR example to trigger this bug because it relies on matching non-canonical IR. There are no tests that show the need for matching this pattern, so I'm just deleting it to fix the miscompile.	2020-07-04 11:16:23 -04:00
Roman Lebedev	c3b8bd1eea	[InstCombine] Always try to invert non-canonical predicate of an icmp Summary: The actual transform i was going after was: https://rise4fun.com/Alive/Tp9H ``` Name: zz Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0 %t0 = and i8 %x, C0 %r = icmp eq i8 %t0, C1 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1 Name: zz Pre: isPowerOf2(C0) %t0 = and i8 %x, C0 %r = icmp ne i8 %t0, 0 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1 ``` but as it can be seen from the current tests, we already canonicalize most of it, and we are only missing handling multi-use non-canonical icmp predicates. If we have both `!=0` and `==0`, even though we can CSE them, we end up being stuck with them. We should canonicalize to the `==0`. I believe this is one of the cleanup steps i'll need after `-scalarizer` if i end up proceeding with my WIP alloca promotion helper pass. Reviewers: spatel, jdoerfert, nikic Reviewed By: nikic Subscribers: zzheng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83139	2020-07-04 18:12:04 +03:00
Sanjay Patel	46a285ad9e	[IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC The predicate can always be used to distinguish between icmp and fcmp, so we don't need to keep repeating this check in the callers.	2020-06-18 15:47:06 -04:00
Sam Parker	5bf0858c0b	Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant" I originally reverted the patch because it was causing performance issues, but now I think it's just enabling simplify-cfg to do something that I don't want instead :) Sorry for the noise. This reverts commit `3e39760f8e`.	2020-06-17 11:38:59 +01:00
Sam Parker	3e39760f8e	Revert "Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant"" This reverts commit `23291b9863`. This caused performance regressions.	2020-06-15 07:46:28 +01:00
Max Kazantsev	23291b9863	Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant" This reverts commit `c4b5a66e44`. Returning along with Clang test fix	2020-06-05 20:48:29 +07:00
Kadir Cetinkaya	c4b5a66e44	Revert "[InstCombine] Simplify compare of Phi with constant inputs against a constant" This reverts commit `16b7eb6dd1`. Breaks build bots, see http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/29888 for an example.	2020-06-05 13:02:35 +02:00
Max Kazantsev	16b7eb6dd1	[InstCombine] Simplify compare of Phi with constant inputs against a constant We can simplify ``` icmp <pred> phi(C1, C2, ...), C ``` with ``` phi(icmp(C1, C), icmp(C2, C), ...) ``` provided that all comparison of constants are constants themselves. Differential Revision: https://reviews.llvm.org/D81151 Reviewed By: lebedev.ri	2020-06-05 17:02:47 +07:00
Max Kazantsev	80cb25cbd5	Revert "[InstCombine][NFC] Factor out constant check" This reverts commit `9bdb918890`. This refactoring proved to not be useful.	2020-06-05 12:00:44 +07:00
Max Kazantsev	9bdb918890	[InstCombine][NFC] Factor out constant check We plan to add more transforms here. Besides, this check should be done in the beginning just from function's name.	2020-06-04 18:54:23 +07:00
Christopher Tetreault	8f8029b458	[SVE] Eliminate calls to default-false VectorType::get() from InstCombine Reviewers: efriedma, david-arm, fpetrogalli, spatel Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80334	2020-05-29 15:31:31 -07:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Sanjay Patel	4abab5c5ca	[InstCombine] generalize canonicalization of masked equality comparisons (X \| MaskC) == C --> (X & ~MaskC) == C ^ MaskC (X \| MaskC) != C --> (X & ~MaskC) != C ^ MaskC We have more analyis for 'and' patterns and already lean this way in the existing code, so this should be neutral or better in IR. If this does not do as well in codegen, the problem already exists and we should fix that based on target costs/heuristics. http://volta.cs.utah.edu:8080/z/oP3ecL define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) { %or = or i8 %x, %OrC %eq = icmp eq i8 %or, %C store i1 %eq, i1* %p0 %ne = icmp ne i8 %or, %C store i1 %ne, i1* %p1 ret void } define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) { %NotOrC = xor i8 %OrC, -1 %a = and i8 %x, %NotOrC %NewC = xor i8 %C, %OrC %eq = icmp eq i8 %a, %NewC store i1 %eq, i1* %p0 %ne = icmp ne i8 %a, %NewC store i1 %ne, i1* %p1 ret void }	2020-04-25 11:31:57 -04:00
Eric Christopher	45dca04395	Exclude bitcast and ext/trunc signbit optimization on ppc_fp128 Revision `a1c05fe` <https://reviews.llvm.org/rGa1c05fe20f3def1f1be9f50d2adefc6b6f1578ad> removed bitcast from the list of problematic transformations, however: %97 = fptrunc ppc_fp128 %2 to double // we need to check ppc_fp128 here to prevent the transformation %98 = bitcast double %97 to i64 // `a1c05fe` checks ppc_fp128 at here %99 = icmp slt i64 %98, 0 %100 = zext i1 %99 to i8 store i8 %100, i8* %7, align 1 so this patch does that. I'm also disabling it in the presence of extend just in case. I verified separately that the hash of -std::infinity and std::infinity don't match now. Differential Revision: https://reviews.llvm.org/D77911	2020-04-10 17:07:55 -07:00
Christopher Tetreault	155740cc33	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: sdesmalen, rriddle, efriedma Reviewed By: sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77263	2020-04-08 15:15:41 -07:00
Sanjay Patel	a1c05fe20f	[InstCombine] exclude bitcast of ppc_fp128 in icmp signbit fold Based on the post-commit comments for rG0f56bbc, there might be a problem with this transform: (bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0 ...and the ppc_fp128 data type, so conservatively bypass if we are bitcasting a ppc_fp128. We might be able to account for endian or other differences to enable this for PowerPC again if that is useful. Differential Revision: https://reviews.llvm.org/D77642	2020-04-08 08:56:19 -04:00
Sanjay Patel	12fcbcecff	[InstCombine] add tests for cmyk benchmark; NFC These are versions of a function that regressed with: rGf2fbdf76d8d0 That particular problem occurs with an instcombine-simplifycfg-instcombine sequence, but we can show that it exists within instcombine only with other variations of the pattern.	2020-04-02 13:00:46 -04:00
Sanjay Patel	1008435f3d	Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold" This reverts commit `f2fbdf76d8`. As noted in the post-commit thread: https://reviews.llvm.org/rGf2fbdf76d8d0 ...this can obscure a min/max pattern where the components have extra uses. We can show that the problem is independent of this change with a slightly modified source example, so this revert just delays/reduces the need to fix the real problem. We need to improve our analysis of negation or -- more generally -- subtraction using patches like D77230 or D68408.	2020-04-02 09:15:23 -04:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Sanjay Patel	f2fbdf76d8	[InstCombine] do not exclude min/max from icmp with casted operand fold InstCombine has a mess of logic that tries to preserve min/max patterns, but AFAICT, this one is not necessary because we can always narrow the corresponding select in this sequence to match the narrow compare. The biggest danger for this patch is inducing infinite looping or assert from exceeding max iterations. If any bots hit that in the vicinity of this commit, this is the likely patch to blame.	2020-03-30 16:10:51 -04:00
Nikita Popov	8253a86b65	[InstCombine] Erase old mul when creating umulo As we don't return the result of replaceInstUsesWith(), we are responsible for erasing the instruction. There is a small subtlety here in that we need to do this after the other uses of Builder, which uses the original multiply as the insertion point. NFC apart from worklist order changes.	2020-03-29 20:46:08 +02:00
Nikita Popov	a9ddcd6411	[InstCombine] Erase old add when optimizing add overflow We don't return the replaceInstUsesWith() result, so we're responsible for cleaning up. NFC apart from worklist order changes.	2020-03-29 20:20:14 +02:00
Nikita Popov	6f07a9e80a	[InstCombine] Erase original add when creating saddo Usually when we replaceInstUsesWith() we also return the original instruction, and InstCombine will take care of erasing it. Here we don't do that, so we need to manually erase it. NFC apart from worklist order changes.	2020-03-29 18:01:32 +02:00
Nikita Popov	1e363023b8	[InstCombine] Use replaceOperand() in a few more places To make sure the old operands get DCEd. NFC apart from worklist order changes.	2020-03-29 18:01:00 +02:00
Sanjay Patel	0f56bbc1a5	[InstCombine] reduce FP-casted and bitcasted signbit check PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305 Alive2 proofs: http://volta.cs.utah.edu:8080/z/bVyrko http://volta.cs.utah.edu:8080/z/Vxpz9q	2020-03-27 17:33:59 -04:00
Huihui Zhang	118abf2017	[SVE] Update API ConstantVector::getSplat() to use ElementCount. Summary: Support ConstantInt::get() and Constant::getAllOnesValue() for scalable vector type, this requires ConstantVector::getSplat() to take in 'ElementCount', instead of 'unsigned' number of element count. This change is needed for D73753. Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74386	2020-03-12 13:22:41 -07:00
Jay Foad	11d1573bb6	[APFloat] Make use of new overloaded comparison operators. NFC. Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75744	2020-03-06 16:42:53 +00:00
Jay Foad	f41e82c82c	[InstCombine] Fix confusing variable name.	2020-02-27 11:27:49 +00:00
Roman Lebedev	2855c8fed9	[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): fix miscompile (PR44802) Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(), as input, we have the following pattern: icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0 We want to rewrite that as: icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x) While we know that originally (Q+K) would not overflow (because 2 * (N-1) u<= iN -1), we may have looked past extensions of shift amounts. so it may now overflow in smaller bitwidth. To ensure that does not happen, we need to ensure that the total maximal shift amount is still representable in that smaller bitwidth. If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus. https://bugs.llvm.org/show_bug.cgi?id=44802	2020-02-25 18:23:58 +03:00
Florian Hahn	7769030b93	Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check." This version fixes a buildbot failure cause by picking the wrong insert point for XORs. We cannot pick the XOR binary operator as insert point, as it is not guaranteed that both input operands for the overflow intrinsic are defined before it. This reverts the revert commit `c7fc0e5da6`.	2020-02-23 18:33:18 +00:00
Florian Hahn	c7fc0e5da6	Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check." This reverts commit `e01a3d49c2`. and commit `a6a585b803`. This causes a failure on GreenDragon: http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597	2020-02-19 19:37:08 +01:00
Florian Hahn	e01a3d49c2	[PatternMatch] Match XOR variant of unsigned-add overflow check. Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match the expected pattern in CodeGenPerpare via UAddWithOverflow. This causes a regression over Clang 7 on both X86 and AArch64: https://gcc.godbolt.org/z/juhXYV This patch extends UAddWithOverflow to also catch the XOR case, if the XOR is only used in the ICMP. This covers just a single case, but I'd like to make sure I am not missing anything before tackling the other cases. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D74228	2020-02-19 15:25:18 +01:00
Nikita Popov	9adedd146d	[InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754) Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have a fold that converts icmp (and (ashr X, C3), C2), C1 into icmp (and C2'), C1', but it imposed overly strict requirements on the transform. Relax this by checking that both C2 and C1 don't shift out bits (in a signed sense) when forming the new constants. Alive proofs (https://rise4fun.com/Alive/PTz0): Name: ashr_legal Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp i16 %b, C1 => %d = and i16 %x, C2 << C3 %c = icmp i16 %d, C1 << C3 Name: ashr_shiftout_eq Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp eq i16 %b, C1 => %c = false Note that >> corresponds to ashr here. The case of an equality comparison has some special handling in this transform, because it will form to a true/false result if the condition on the comparison constant it violated. Differential Revision: https://reviews.llvm.org/D74294	2020-02-18 17:49:46 +01:00
Nikita Popov	5a8819b216	[InstCombine] Use replaceOperand() in more places This is a followup to D73803, which uses the replaceOperand() helper in more places. This should be NFC apart from changes to worklist order. Differential Revision: https://reviews.llvm.org/D73919	2020-02-11 17:38:23 +01:00
Nikita Popov	a05932931c	[InstCombine] Refactor foldICmpAndShift(); NFCI Separate out handling for shl, lshr and ashr. The combined handling obscured some overly pessimistic requirements for the transform.	2020-02-08 22:27:43 +01:00
Nikita Popov	d4627b90a0	[InstCombine] Avoid modifying instructions in-place As discussed on D73919, this replaces a few cases where we were modifying multiple operands of instructions in-place with the creation of a new instruction, which we generally prefer nowadays. This tends to be more readable and less prone to worklist management bugs. Test changes are only superficial (instruction naming and order).	2020-02-08 17:05:56 +01:00
Nikita Popov	878cb38a5c	[InstCombine] Add replaceOperand() helper Adds a replaceOperand() helper, which is like Instruction.setOperand() but adds the old operand to the worklist. This reduces the amount of missing or incorrect worklist management. This only applies the helper to a relatively small subset of setOperand() calls in InstCombine, namely those of the pattern `I.setOperand(); return &I;`, where it is most obviously applicable. Differential Revision: https://reviews.llvm.org/D73803	2020-02-03 19:00:17 +01:00
Nikita Popov	e6c9ab4fb7	[InstCombine] Rename worklist methods; NFC This renames Worklist.AddDeferred() to Worklist.add() and Worklist.Add() to Worklist.push(). The intention here is that Worklist.add() should be the go-to method for explicit worklist management, while the raw Worklist.push() is mostly for InstCombine internals. I will then migrate uses of Worklist.push() to Worklist.add() in followup changes. As suggested by spatel on D73411 I'm also changing the remaining method names to lowercase first character, in line with current coding standards. Differential Revision: https://reviews.llvm.org/D73745	2020-02-03 18:56:51 +01:00
Nikita Popov	90b5ed996b	[InstCombine] Remove unnecessary worklist add; NFCI The IRBuilder will automatically add instructions to the worklist. Adding it manually is unnecessary, but may mess up worklist order.	2020-01-30 23:06:28 +01:00
Nikita Popov	cad91074a6	[InstCombine] Create new insts in foldICmpEqIntrinsicWithConstant; NFCI In line with current conventions, create new instructions rather than modify two operands in place and performing manual worklist management. This should be NFC apart from possible worklist order changes.	2020-01-30 23:03:16 +01:00
Nikita Popov	e086e23024	[InstCombine] Support non-splat vectors in icmp eq + add/sub fold For the icmp eq (add X, C1), C2 => icmp eq X, C2-C1 icmp eq (sub C1, X), C2 => icmp eq X, C1-C2 folds, this allows C1 to be non-splat and contain undefs. C2 is still splat, due to the structure of the code. This is to address the remaining part of the regression in D73411, where demanded element analysis replaces some elements with undef. Differential Revision: https://reviews.llvm.org/D73647	2020-01-29 20:56:58 +01:00
Sanjay Patel	87f6314f8c	[InstCombine] canonicalize splat shuffle after cmp cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M As discussed in PR44588: https://bugs.llvm.org/show_bug.cgi?id=44588 ...we try harder to push shuffles after binops than after compares. This patch handles the special (but presumably most common case) of splat shuffles. If both operands are splats, then we can do the comparison on the non-splat inputs followed by splat of the compare. That should take care of the regression noted in D73411. There's another potential fold requested in PR37463 to scalarize the compare, but that's another patch (and it's not clear if we can do that without the ability to undo it later): https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D73575	2020-01-29 08:34:29 -05:00
Sanjay Patel	7a717d82ff	[InstCombine] refactor foldVectorCmp(); NFC We can handle other patterns here as shown in PR44588.	2020-01-28 14:40:48 -05:00
Nikita Popov	efba7ed05e	[PatternMatch] Make m_c_ICmp swap the predicate (PR42801) This addresses https://bugs.llvm.org/show_bug.cgi?id=42801. The m_c_ICmp() matcher is changed to provide the swapped predicate if the operands are swapped. Existing uses of m_c_ICmp() fall in one of two categories: Working on equality predicates only, where swapping is irrelevant. Or performing a manual swap, in which case this patch removes it. The only exception is the foldICmpWithLowBitMaskedVal() fold, which does not swap the predicate, and instead reasons about whether a swap occurred or not for each predicate. Getting the swapped predicate allows us to merge the logic for pairs of predicates, instead of duplicating it. Differential Revision: https://reviews.llvm.org/D72976	2020-01-22 22:56:26 +01:00
Sanjay Patel	1640582743	[InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383) As shown in P44383: https://bugs.llvm.org/show_bug.cgi?id=44383 ...we can't safely propagate a vector constant through this icmp fold if that vector constant contains undefined elements. We know that each defined element of the constant is safe though, so find the first of those and replicate it into the formerly undef lanes. Differential Revision: https://reviews.llvm.org/D72101	2020-01-03 09:16:57 -05:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Roman Lebedev	0f22e783a0	[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100	2019-12-02 18:06:15 +03:00
Dávid Bolvanský	d825ed24d2	Revert "[InstructionCompares] Fixed null check after dereferencing warning. NFCI." This reverts commit `b8685cf304`.	2019-11-03 20:24:01 +01:00
Dávid Bolvanský	b8685cf304	[InstructionCompares] Fixed null check after dereferencing warning. NFCI.	2019-11-03 20:13:45 +01:00
Sanjay Patel	a22282be54	[InstCombine] make icmp vector canonicalization safe for constant with undef elements This is a fix for: https://bugs.llvm.org/show_bug.cgi?id=43730 ...and as shown there, we have existing test cases that show potential miscompiles. We could just bail out for vector constants that contain any undef elements, or we can do as shown here: allow the transform, but replace the undefs with a safe value. For most of the tests shown, this results in a full splat constant (no undefs) which is probably a win for further IR analysis because we conservatively don't match undefs in most cases. Codegen can probably recover these kinds of undef lanes via demanded elements analysis if that's profitable. Differential Revision: https://reviews.llvm.org/D69519	2019-10-29 10:58:14 -04:00
Nikita Popov	b1b7a2f7b6	[InstCombine] Fold uadd.sat(a, b) == 0 and usub.sat(a, b) == 0 This adds folds for comparing uadd.sat/usub.sat with zero: * uadd.sat(a, b) == 0 => a == 0 && b == 0 => (a \| b) == 0 * usub.sat(a, b) == 0 => a <= b And inverted forms for !=. Differential Revision: https://reviews.llvm.org/D69224 llvm-svn: 375374	2019-10-20 20:19:42 +00:00
Roman Lebedev	49483a3bc2	[InstCombine] Shift amount reassociation in shifty sign bit test (PR43595) Summary: This problem consists of several parts: * Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`. This is trivial, and easy to do, we have a fold for it. * Shift amount reassociation - if we have two identical shifts, and we can simplify-add their shift amounts together, then we likely can just perform them as a single shift. But this is finicky, has one-use restrictions, and shift opcodes must be identical. But there is a super-pattern where both of these work together. to produce sign bit test from two shifts + comparison. We do indeed already handle this in most cases. But since we get that fold transitively, it has one-use restrictions. And what's worse, in this case the right-shifts aren't required to be identical, and we can't handle that transitively: If the total shift amount is bitwidth-1, only a sign bit will remain in the output value. But if we look at this from the perspective of two shifts, we can't fold - we can't possibly know what bit pattern we'd produce via two shifts, it will be some kind of a mask produced from original sign bit, but we just can't tell it's shape: https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN But it will only contain sign bit and zeros. So from the perspective of sign bit test, we're good: https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU Superb! So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a sudo-analysis mode that will ignore extra-uses, and will only check whether a) those are two right shifts and b) they end up with bitwidth(x)-1 shift amount and return either the original value that we sign-checking, or null. This does not have any functionality change for the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`. All that being said, as disscussed in the review, this yet again increases usage of instsimplify in instcombine as utility. Some day that may need to be reevaluated. https://bugs.llvm.org/show_bug.cgi?id=43595 Reviewers: spatel, efriedma, vsk Reviewed By: spatel Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68930 llvm-svn: 375371	2019-10-20 19:38:50 +00:00
Roman Lebedev	0c73be590e	[InstCombine] Move isSignBitCheck(), handle rest of the predicates True, no test coverage is being added here. But those non-canonical predicates that are already handled here already have no test coverage as far as i can tell. I tried to add tests for them, but all the patterns already get handled elsewhere. llvm-svn: 373962	2019-10-07 20:53:08 +00:00
Roman Lebedev	fb5af8b9b9	[InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0' We do indeed already get it right in some cases, but only transitively, with one-use restrictions. Since we only need to produce a single comparison, it makes sense to match the pattern directly: https://rise4fun.com/Alive/kPg llvm-svn: 373802	2019-10-04 22:16:22 +00:00
Bjorn Pettersson	163c54d288	[InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant Summary: Removing an assumption (assert) that the CmpInst already has been simplified in getFlippedStrictnessPredicateAndConstant. Solution is to simply bail out instead of hitting the assertion. Instead we assume that any profitable rewrite will happen in the next iteration of InstCombine. The reason why we can't assume that the CmpInst already has been simplified is that the worklist does not guarantee such an ordering. Solves https://bugs.llvm.org/show_bug.cgi?id=43376 Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68022 llvm-svn: 372972	2019-09-26 12:16:01 +00:00
Roman Lebedev	23646952e2	[InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0 https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912	2019-09-25 19:06:40 +00:00
Sanjay Patel	eb8d39e113	[InstCombine] allow icmp+binop folds before min/max bailout (PR43310) This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510	2019-09-22 14:31:53 +00:00
Sanjay Patel	3961a143e1	[InstCombine] remove unneeded one-use checks for icmp fold Related folds were added in: rL125734 ...the code comment about register pressure is discussed in more detail in: https://bugs.llvm.org/show_bug.cgi?id=2698 But 10 years later, perf testing bzip2 with this change now shows a slight (0.2% average) improvement on Haswell although that's probably within test noise. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 and rL371981 are related patches in this series. llvm-svn: 372007	2019-09-16 16:15:25 +00:00
Sanjay Patel	c5cd808156	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 <https://reviews.llvm.org/rL125734> ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 is a related patch in this series. llvm-svn: 371981	2019-09-16 12:54:34 +00:00
Sanjay Patel	91c2cd0691	[InstCombine] fix comments to match code; NFC This blob was written before match() existed, so it could probably be reduced significantly. But I suspect it isn't well tested, so tests would have to be added to reduce risk from logic changes. llvm-svn: 371978	2019-09-16 12:12:05 +00:00
Sanjay Patel	3daf168fa9	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. There are similar checks as noted with the TODO comments. I'm hoping to remove those restrictions too, but if any of these does cause a regression, it should be easier to correct by making small, individual commits. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. llvm-svn: 371940	2019-09-15 20:56:34 +00:00
Sanjay Patel	80bea345d1	[InstCombine] fold sign-bit compares of srem (srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking off the sign bit and the module (low) bits: https://rise4fun.com/Alive/jSO A '2' divisor allows slightly more folding: https://rise4fun.com/Alive/tDBM Any chance to remove an 'srem' use is probably worthwhile, but this is limited to the one-use improvement case because doing more may expose other missing folds. That means it does nothing for PR21929 yet: https://bugs.llvm.org/show_bug.cgi?id=21929 Differential Revision: https://reviews.llvm.org/D67334 llvm-svn: 371610	2019-09-11 12:04:26 +00:00
Matt Arsenault	524a9d5774	InstCombine: Fix crash on icmp of gep with addrspacecasted null llvm-svn: 371146	2019-09-05 23:39:21 +00:00
Roman Lebedev	8360c42e25	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101	2019-09-05 17:41:02 +00:00
Roman Lebedev	ecb7ea1ae7	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100	2019-09-05 17:40:49 +00:00
Roman Lebedev	473a063a5e	[InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348	2019-08-29 12:47:20 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Roman Lebedev	f13b0e3ed8	[InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399) Summary: Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j I.e. we can perform the fold if any of the following is true: * The shift amount is either zero or one less than widest bitwidth * Either of the values being shifted has at most lowest bit set * The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount; * The value that is being shifted by `lshr` (which is truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one I strongly suspect there is some better generalization, but i'm not aware of it as of right now. For now i also avoided using actual `computeKnownBits()`, but restricted it to constants. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66383 llvm-svn: 370324	2019-08-29 10:26:23 +00:00
Simon Pilgrim	ef9c6a7077	Fix variable set but no used warning on NDEBUG builds. NFCI. llvm-svn: 370317	2019-08-29 09:58:47 +00:00
Craig Topper	f79d8a064c	[InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs Due to missing vector support in this function, recursion can generate worse code in some cases. llvm-svn: 370221	2019-08-28 15:40:34 +00:00
Craig Topper	5bbb604bb5	[InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions. llvm-svn: 370114	2019-08-27 21:38:56 +00:00
Philip Reames	b92c971099	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well. Differential Revision: https://reviews.llvm.org/D66671 llvm-svn: 369949	2019-08-26 19:11:49 +00:00
Roman Lebedev	de19f749e0	[InstCombine] matchThreeWayIntCompare(): commutativity awareness Summary: `matchThreeWayIntCompare()` looks for ``` select i1 (a == b), i32 Equal, i32 (select i1 (a < b), i32 Less, i32 Greater) ``` but both of these selects/compares can be in it's commuted form, so out of 8 variants, only the two most basic ones is handled. This fixes regression being introduced in D66232. Reviewers: spatel, nikic, efriedma, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66607 llvm-svn: 369841	2019-08-24 06:49:36 +00:00
Roman Lebedev	2c75fe7f2a	[InstCombine] Try to reuse constant from select in leading comparison Summary: If we have e.g.: ``` %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 ``` the constants `65535` and `65536` are suspiciously close. We could perform a transformation to deduplicate them: ``` Name: ult %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 => %t.inv = icmp ugt i32 %x, 65535 %r = select i1 %t.inv, i32 65535, i32 %y ``` https://rise4fun.com/Alive/avb While this may seem esoteric, this should certainly be good for vectors (less constant pool usage) and for opt-for-size - need to have only one constant. But the real fun part here is that it allows further transformation, in particular it finishes cleaning up the `clamp` folding, see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`. We start with e.g. ``` %dont_need_to_clamp_positive = icmp sle i32 %X, 32767 %dont_need_to_clamp_negative = icmp sge i32 %X, -32768 %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767 %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit ``` without this patch we currently produce ``` %1 = icmp slt i32 %X, 32768 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1, i32 %3, i32 32767 ``` which isn't really a `clamp` - both comparisons are performed on the original value, this patch changes it into ``` %1.inv = icmp sgt i32 %X, 32767 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1.inv, i32 32767, i32 %3 ``` and then the magic happens! Some further transform finishes polishing it and we finally get: ``` %t1 = icmp sgt i32 %X, -32768 %t2 = select i1 %t1, i32 %X, i32 -32768 %t3 = icmp slt i32 %t2, 32767 %R = select i1 %t3, i32 %t2, i32 32767 ``` which is beautiful and just what we want. Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization: https://rise4fun.com/Alive/THl Proofs for the fold itself: https://rise4fun.com/Alive/THl Reviewers: spatel, dmgreen, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66232 llvm-svn: 369840	2019-08-24 06:49:25 +00:00
Philip Reames	9cb059fdcc	Fix a bug in just submitted rL369789 Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :( Also adding an extra vector test as it happened to be in workspace and wasn't worth separating. llvm-svn: 369795	2019-08-23 18:27:57 +00:00
Philip Reames	5b02cfa0b3	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another. The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison. Differential Revision: https://reviews.llvm.org/D66608 llvm-svn: 369789	2019-08-23 17:58:58 +00:00
Philip Reames	764b0fd5a3	[instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0 Noticed while looking at pr43028. llvm-svn: 369541	2019-08-21 15:51:57 +00:00
Sanjay Patel	e728259278	[InstCombine] narrow icmp with extended operands of different widths An intermediate extend is used to widen the narrow operand to the width of the other (wider) operand. At that point, we have the same logic as the existing transform that was restricted to folds of equal width zext/sext. This mostly solves PR42700: https://bugs.llvm.org/show_bug.cgi?id=42700 llvm-svn: 369519	2019-08-21 11:56:08 +00:00
Sanjay Patel	292b1087f4	[InstCombine] add helper function for icmp+zext/sext; NFC llvm-svn: 369421	2019-08-20 18:15:17 +00:00
Sanjay Patel	2e68e4d60e	[InstCombine] make fold for icmp with sext more efficient; NFC We were creating 2 instructions and relying on a subsequent fold to invert a not(icmp). Create the final icmp directly instead. llvm-svn: 369411	2019-08-20 17:03:22 +00:00
Sanjay Patel	a90ee0eeb6	[InstCombine] improve readability for icmp with cast folds; NFC 1. Update function name and stale code comments. 2. Use variable names that are less ambiguous. 3. Move operand checks into the function as early exits. llvm-svn: 369390	2019-08-20 14:56:44 +00:00
Roman Lebedev	9b957d3321	[InstCombine] Cherry-pick NFC cleanups of foldShiftIntoShiftInAnotherHandOfAndInICmp() from D66383 llvm-svn: 369207	2019-08-18 12:26:33 +00:00
Roman Lebedev	16244fccfe	[InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399) Summary: This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399 I thought naive pattern would solve my issue, but nope, it involved truncation, thus more folds needed.. This isn't really the fold i'm interested in, i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler. In this case, no extra legality checks are needed: https://rise4fun.com/Alive/CAb We should be careful about not increasing instruction count, since we need to produce `zext` because `and` is done in wider type. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66057 llvm-svn: 369117	2019-08-16 15:10:41 +00:00
Roman Lebedev	32f1e1a01d	[InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of canonicalizeCmpWithConstant(), NFCI I'd like to use it elsewhere, hopefully without reinventing the wheel. No functional change intended so far. llvm-svn: 368820	2019-08-14 09:57:20 +00:00
Roman Lebedev	ccdad6ef48	[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962) Instead of matching value and then blindly casting to BinaryOperator just to get the opcode, just match instruction and do no cast. Fixes https://bugs.llvm.org/show_bug.cgi?id=42962 llvm-svn: 368554	2019-08-12 11:28:02 +00:00
Roman Lebedev	96474d17c6	[InstCombine][NFC] Use SimplifyAddInst() instead of SimplifyBinOp(Instruction::BinaryOps::Add, ) llvm-svn: 368521	2019-08-10 19:29:10 +00:00
Roman Lebedev	a8d20b4467	[InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant If one of the values being shifted is a constant, since the new shift amount is known-constant, the new shift will end up being constant-folded so, we don't need that one-use restriction then. llvm-svn: 368519	2019-08-10 19:28:54 +00:00
Roman Lebedev	64fe806c4e	[InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction That one-use restriction is not needed for correctness - we have already ensured that one of the shifts will go away, so we know we won't increase the instruction count. So there is no need for that restriction. llvm-svn: 368518	2019-08-10 19:28:44 +00:00
Roman Lebedev	be612ea471	[InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two Summary: I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling. While we did happen to handle this fold in most of the cases, the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two), or first turn `x s% -y` to `x u% y`; that does handle most of the cases. But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`, and thus we end up being stuck with `(x s% INT_MIN) == 0`. There is no such restriction for the more general fold: https://rise4fun.com/Alive/IIeS To be noted, the fold does not enforce that `y` is a constant, so it may indeed increase instruction count. This is consistent with what `x u% y`->`x & (y-1)` already does. I think it makes sense, it's at most one (simple) extra instruction, while `rem`ainder is really much more un-simple (and likely very costly). Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65046 llvm-svn: 367322	2019-07-30 15:28:22 +00:00
Roman Lebedev	c5f92bd67b	[PatternMatch] Generalize m_SpecificInt_ULT() to take ICmpInst::Predicate As discussed in the original review, this may be useful, so let's just do it. llvm-svn: 365652	2019-07-10 16:07:35 +00:00
Sanjay Patel	ddc1b40f26	[InstCombine] reduce more checks for power-of-2-or-zero using ctpop Extends the transform from: rL364341 ...to include another (more common?) pattern that tests whether a value is a power-of-2 (including or excluding zero). llvm-svn: 364856	2019-07-01 22:00:00 +00:00
Roman Lebedev	72b8d41ce8	[InstCombine] Shift amount reassociation in bittest (PR42399) Summary: Given pattern: `icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0` we should move shifts to the same hand of 'and', i.e. rewrite as `icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)` It might be tempting to not restrict this to situations where we know we'd fold two shifts together, but i'm not sure what rules should there be to avoid endless combine loops. We pick the same shift that was originally used to shift the variable we picked to shift: https://rise4fun.com/Alive/6x1v Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 \| PR42399]]. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63829 llvm-svn: 364791	2019-07-01 15:55:15 +00:00
Roman Lebedev	f55818e3a7	[InstCombine] Omit 'urem' where possible This was added in D63390 / rL364286 to backend, but it makes sense to also handle it in middle-end. https://rise4fun.com/Alive/Zsln llvm-svn: 364738	2019-07-01 09:41:43 +00:00
Huihui Zhang	b90cb57b63	[InstCombine] Simplify icmp ult/uge (shl %x, C2), C1 iff C1 is power of two -> icmp eq/ne (and %x, (lshr -C1, C2)), 0. Simplify 'shl' inequality test into 'and' equality test. This pattern happens in the middle-end while simplifying bitfield access, Exposed in https://reviews.llvm.org/D63505 https://rise4fun.com/Alive/6uz Reviewers: lebedev.ri, efriedma Reviewed By: lebedev.ri Subscribers: spatel, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63675 llvm-svn: 364348	2019-06-25 20:44:52 +00:00
Sanjay Patel	fcfa056ceb	[InstCombine] reduce checks for power-of-2-or-zero using ctpop This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero. This is matching the isPowerOf2() patterns used in PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 But there's at least 1 instcombine follow-up needed to match the alternate form: (v & (v - 1)) == 0; We should have all of the backend expansions handled with: rL364319 (x86-specific changes still needed for optimal code based on subtarget) And the larger patterns to exclude zero as a power-of-2 are joining with this change after: rL364153 ( D63660 ) rL364246 Differential Revision: https://reviews.llvm.org/D63777 llvm-svn: 364341	2019-06-25 18:51:44 +00:00
Huihui Zhang	4626613ffe	[InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier. Summary: To generate simplified IR, make sure fold (X & ~C) ==/!= 0 --> X u</u>= C+1 is scheduled before fold ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. https://rise4fun.com/Alive/7ZN Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63505 llvm-svn: 364255	2019-06-25 00:09:10 +00:00
Sanjay Patel	273d97e6bf	[InstCombine] fix typo in comment; NFC llvm-svn: 363974	2019-06-20 20:23:32 +00:00
Sanjay Patel	63311bfb83	[InstCombine] canonicalize check for power-of-2 The form that compares against 0 is better because: 1. It removes a use of the input value. 2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS). This is a root cause for PR42314, but probably doesn't completely answer the codegen request: https://bugs.llvm.org/show_bug.cgi?id=42314 Alive proof: https://rise4fun.com/Alive/9kG Name: is power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp eq i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp eq i32 %a2, 0 Name: is not power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp ne i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp ne i32 %a2, 0 llvm-svn: 363956	2019-06-20 17:41:15 +00:00

1 2 3 4 5 ...

832 Commits