llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	f6b87da0c7	[InstCombine] Fold comparison of abs with int min If the abs is poisoning, this is already folded to true/false. For non-poisoning abs, we can convert this to a comparison with the operand.	2020-09-08 20:23:03 +02:00
Nikita Popov	e97f3b1b43	[InstCombine] Fold abs of known negative operand If we know that the abs operand is known negative, we can replace it with a neg. To avoid computing known bits twice, I've removed the fold for the non-negative case from InstSimplify. Both the non-negative and the negative case are handled by InstCombine now, with one known bits call. Differential Revision: https://reviews.llvm.org/D87196	2020-09-08 20:14:35 +02:00
Sanjay Patel	8b30067919	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Sanjay Patel	7a6d6f0f70	[InstCombine] improve folds for icmp with multiply operands (PR47432) Check for no overflow along with an odd constant before we lose information by converting to bitwise logic. https://rise4fun.com/Alive/2Xl Pre: C1 != 0 %mx = mul nsw i8 %x, C1 %my = mul nsw i8 %y, C1 %r = icmp eq i8 %mx, %my => %r = icmp eq i8 %x, %y Name: nuw ne Pre: C1 != 0 %mx = mul nuw i8 %x, C1 %my = mul nuw i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y Name: odd ne Pre: C1 % 2 != 0 %mx = mul i8 %x, C1 %my = mul i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y	2020-09-07 12:40:37 -04:00
Sanjay Patel	b22910daab	[InstCombine] erase instructions leading up to unreachable Normal dead code elimination ignores assume intrinsics, so we fail to delete assumes that are not meaningful (and potentially worse if they cause conflicts with other assumptions). The motivating example in https://llvm.org/PR47416 suggests that we might have problems upstream from here (difference between C and C++), but this should be a cheap way to make sure we remove more dead code. Differential Revision: https://reviews.llvm.org/D87149	2020-09-07 10:44:08 -04:00
Sanjay Patel	3ca8b9a560	[InstCombine] give a name to an intermediate value for easier tracking; NFC As noted in PR47430, we probably want to conditionally include 'nsw' here anyway, so we are going to need to fill out the optional args.	2020-09-07 08:19:42 -04:00
Nikita Popov	4892d3a198	[InstCombine] Fold abs with dominating condition Similar to D87168, but for abs. If we have a dominating x >= 0 condition, then we know that abs(x) is x. This fold is in InstCombine, because we need to create a sub instruction for the x < 0 case. Differential Revision: https://reviews.llvm.org/D87184	2020-09-05 16:18:35 +02:00
Nikita Popov	ada8a17d94	[InstCombine] Fold abs intrinsic eq zero Following the same transform for the select version of abs.	2020-09-05 15:11:38 +02:00
Nikita Popov	58b28fa7a2	[InstCombine] Fold mul of abs intrinsic Same as the existing SPF_ABS fold. We don't need to explicitly handle NABS, as the negs will get folded away first.	2020-09-05 12:37:45 +02:00
Nikita Popov	10cb23c6ca	[InstCombine] Fold cttz of abs intrinsic Same as the existing fold for SPF_ABS. We don't need to explicitly handle the NABS variant, as we'll first fold away the neg in that case.	2020-09-05 12:25:41 +02:00
Sanjay Patel	2391a34f9f	[InstCombine] canonicalize all commutative intrinsics with constant arg	2020-09-03 12:42:04 -04:00
Eli Friedman	96ef6998df	[InstCombine] Fix a couple crashes with extractelement on a scalable vector. Differential Revision: https://reviews.llvm.org/D86989	2020-09-02 18:02:07 -07:00
Venkataramanan Kumar	626c3738cd	[InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X) These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)": 1.0/sqrt(X) * X => X/sqrt(X) X * 1.0/sqrt(X) => X/sqrt(X) We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags. Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X). Differential Revision: https://reviews.llvm.org/D86726	2020-09-02 08:23:48 -04:00
Christopher Tetreault	640f20b0c7	[SVE] Remove calls to VectorType::getNumElements from InstCombine Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D82237	2020-08-31 12:59:10 -07:00
Roman Lebedev	c23aefd7c3	[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647 this must be sufficient otherwise `EliminateDuplicatePHINodes()` would have hit issues with it already.	2020-08-31 22:29:39 +03:00
Roman Lebedev	bf21ce7b90	[InstCombine] Take 3: Perform trivial PHI CSE The original take 1 was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it: > It appears to cause compilation non-determinism and caused stage3 mismatches. Then there was take 2 `3e69871ab5`, which was InstCombine-specific, but it again showed stage2-stage3 differences, and reverted in `bdaa3f86a0`. This is quite alarming. Here, let's try to change how we find existing PHI candidate: due to the worklist order, and the way PHI nodes are inserted (it may be inserted as the first one, or maybe not), let's look at all PHI nodes in the block. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| asm-printer.EmittedInsts \| 7942329 \| 7942457 \| 128 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254312480 \| 16848 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18347 \| -65 \| -0.35% \| 0.35% \| \| early-cse.NumCSE \| 2183283 \| 2183267 \| -16 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3644419 \| 4108 \| 0.11% \| 0.11% \| \| instcombine.NumDeadInst \| 1778204 \| 1783205 \| 5001 \| 0.28% \| 0.28% \| \| instcombine.NumPHICSEs \| 0 \| 22490 \| 22490 \| 0.00% \| 0.00% \| \| instcombine.NumWorklistIterations \| 2023272 \| 2024400 \| 1128 \| 0.06% \| 0.06% \| \| instcount.NumCallInst \| 1758395 \| `1758802` \| 407 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330545 \| -12 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077220 \| 82 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832606 \| 660 \| 0.01% \| 0.01% \| \| simplifycfg.NumHoistCommonCode \| 24186 \| 24187 \| 1 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999767 \| -20046 \| -1.97% \| 1.97% \| ``` So it fires 22490 times, which is less than ~24k the take 1 did, but more than what take 2 did (22228 times) . It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 18:21:24 +03:00
Roman Lebedev	bdaa3f86a0	Revert "[InstCombine] Take 2: Perform trivial PHI CSE" While the original variant with doing this in InstSimplify (rightfully) caused questions and ultimately was detected to be a culprit of stage2-stage3 mismatch, it was expected that InstCombine-based implementation would be fine. But apparently it's not, as http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio suggests. Which suggests that somewhere in InstCombine there is a loop over nondeterministically sorted container, which causes different worklist ordering. This reverts commit `3e69871ab5`.	2020-08-29 16:05:02 +03:00
Nikita Popov	6093b14c2c	[InstCombine] Return replaceInstUsesWith() result (NFC) Follow the usual usage pattern for this function and return the result.	2020-08-29 14:49:57 +02:00
Roman Lebedev	71ac9105cd	[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	e65f213178	[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	bd12113f57	[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith()	2020-08-29 15:10:14 +03:00
Roman Lebedev	49d223274f	[NFC][InstCombine] Add STATISTIC() for how many iterations we did As we've established, if it takes more than two iterations (one to perform folding and one to ensure that no folding opportunities remain) per function, then there are worklist management issues. So it may be interesting to keep track of it.	2020-08-29 15:10:13 +03:00
Roman Lebedev	4f4eecf0ec	[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW As noted in post-commit review, we really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:00 +03:00
Roman Lebedev	3e69871ab5	[InstCombine] Take 2: Perform trivial PHI CSE The original take was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it : > It appears to cause compilation non-determinism and caused stage3 mismatches. However InstCombine already does many different optimizations, so it should be a safe place to do it here. Note that we still can't just compare incoming values ranges, because there is no guarantee that these PHI's we'd simplify to were already re-visited and sorted. However coming up with a test is problematic. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instcombine.NumPHICSEs \| 0 \| 22228 \| 22228 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942329 \| 7942456 \| 127 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254313792 \| 18160 \| 0.01% \| 0.01% \| \| early-cse.NumCSE \| 2183283 \| 2183272 \| -11 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3666911 \| 26600 \| 0.73% \| 0.73% \| \| instcombine.NumDeadInst \| 1778204 \| 1783318 \| 5114 \| 0.29% \| 0.29% \| \| instcount.NumCallInst \| 1758395 \| 1758804 \| 409 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330549 \| -8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077221 \| 83 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832611 \| 665 \| 0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999740 \| -20073 \| -1.97% \| 1.97% \| ``` So it fires ~22k times, which is less than ~24k the take 1 did. It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 13:13:06 +03:00
Nikita Popov	57a26bb7b4	[InstCombine] Fix typo in comment (NFC) As pointed out in post-commit review of D63060.	2020-08-29 10:17:17 +02:00
Nikita Popov	ffe05dd125	[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding usub.sat(a, b) + b to umax(a, b). The backend will expand umax back to usubsat if that is profitable. We may also want to handle uadd.sat(a, b) - b in the future. Differential Revision: https://reviews.llvm.org/D63060	2020-08-28 21:52:29 +02:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Sanjay Patel	c4f0a0896f	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Benjamin Kramer	c6fb72de4f	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit `557b890ff4`. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Roman Lebedev	cdd339c568	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Sanjay Patel	557b890ff4	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Roman Lebedev	56c529300e	[NFC][InstCombine] Adjust naming for some methods to match coding standards Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065	2020-08-24 22:39:34 +03:00
Sanjay Patel	6a44edb8da	[InstCombine] fold abs of select with negated op (PR39474) Similar to the existing transform - peek through a select to match a value and its negation. https://alive2.llvm.org/ce/z/MXi5KG define i8 @src(i1 %b, i8 %x) { %0: %neg = sub i8 0, %x %sel = select i1 %b, i8 %x, i8 %neg %abs = abs i8 %sel, 1 ret i8 %abs } => define i8 @tgt(i1 %b, i8 %x) { %0: %abs = abs i8 %x, 1 ret i8 %abs } Transformation seems to be correct!	2020-08-24 07:37:55 -04:00
Roman Lebedev	f6decfa36d	[InstCombine] Negator: freeze is freely negatible if it's operand is negatible	2020-08-23 23:28:19 +03:00
Sanjay Patel	ec06b38130	[InstCombine] canonicalize 'not' ops before logical shifts This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen. The SCEV motivating case is discussed in: http://bugs.llvm.org/PR47136 There's an analysis motivating case at: http://bugs.llvm.org/PR38781 I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps. Alive proofs: https://rise4fun.com/Alive/BBV Name: shift right of 'not' Pre: C2 == (-1 u>> C1) %a = lshr i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = lshr i8 %n, C1 Name: shift left of 'not' Pre: C2 == (-1 << C1) %a = shl i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = shl i8 %n, C1 Name: ashr of 'not' %a = ashr i8 %x, C1 %r = xor i8 %a, -1 => %n = xor i8 %x, -1 %r = ashr i8 %n, C1 Differential Revision: https://reviews.llvm.org/D86243	2020-08-22 09:38:13 -04:00
Serguei Katkov	9e362bb0eb	[InstCombine] Remove unused entries in gc-live bundle of statepoint If some of gc live value are not used in gc.relocate we can remove them from gc-live bundle of statepoint instruction. Also the CL removes duplicated Values in gc-live bundle. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85959	2020-08-22 01:36:22 +07:00
Serguei Katkov	63d9d56a55	[InstCombine] Move handling of gc.relocate in a gc.statepoint The only def for gc.relocate is a gc.statepoint. But real dependency is not described by def-use chain. Instead this dependency is encoded by indecies of operands in gc-live bundle of statepoint as integer constants in gc.relocate. InstCombine operates by def-use chain. As a result when value in gc-live bundle is simplified the gc.statepoint itself is not simplified but it might simplify dependent gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger check of all dependent gc.relocates by adding them to worklist. This CL handles of gc.relocates as process of gc.statepoint optimization considering gc.statepoint and related gc.relocate as whole entity. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85954	2020-08-21 23:44:23 +07:00
Sanjay Patel	6f3511a01a	[ValueTracking] define/use max recursion depth in header There's a potential motivating case to increase this limit in PR47191: http://bugs.llvm.org/PR47191 But first we should make it less hacky. The limit in InstCombine is directly tied to this value because an increase there can cause asserts in the underlying value tracking calls if not changed together. The usage in VectorUtils is independent, but the comment suggests that we should use the same value unless there's a known reason to diverge. There are similar limits in codegen analysis, but I think we should leave those independent in case we intentionally want the optimization power/cost to be different there. Differential Revision: https://reviews.llvm.org/D86113	2020-08-19 16:56:59 -04:00
Sanjay Patel	c8d711adae	[InstCombine] reduce code duplication; NFC	2020-08-19 12:05:12 -04:00
Benjamin Kramer	b98e25b6d7	Make helpers static. NFC.	2020-08-19 16:00:03 +02:00
Roman Lebedev	3d76a133c7	Revert "[InstCombine] Lower infinite combine loop detection thresholds" And as being reported by Florian Hahn, there's a hit in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto, so reverting until addressed. This reverts commit `71e0b82c9f`.	2020-08-19 16:53:30 +03:00
Roman Lebedev	71e0b82c9f	[InstCombine] Lower infinite combine loop detection thresholds It's been a month since `2f3862eb9f`, and no new bug reports about the threshold were filled, so let's bump it again and wait again.	2020-08-19 14:37:57 +03:00
Roman Lebedev	2f01785857	[NFC][InstCombine] Aggregate reconstruction: use plain map Now that we no longer require for this map to have stable iteration order, we no longer need to pay for keeping the iteration order stable, so switch from `SmallMapVector` to `SmallDenseMap`.	2020-08-19 01:09:25 +03:00
Roman Lebedev	78bd4231bf	[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors While it may seem like we can just "deduplicate" the case where some basic block happens to be a predecessor more than once, which happens for e.g. switches, that is not correct thing to do. We must actually add a PHI operand for each predecessor. This was initially reported to me by David Major as a clang crash during gecko build for android.	2020-08-19 01:00:42 +03:00
Sanjay Patel	139da9c4d7	[InstCombine] fold fabs of select with negated operand This is the FP example shown in: https://bugs.llvm.org/PR39474	2020-08-18 09:23:07 -04:00

1 2 3 4 5 ...

4011 Commits