forked from OSchip/llvm-project
7 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
bf21ce7b90
|
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was |
|
|
|
bdaa3f86a0
|
Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.
But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.
Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.
This reverts commit
|
|
|
|
3e69871ab5
|
[InstCombine] Take 2: Perform trivial PHI CSE
The original take was |
|
|
|
ed90f15efb |
Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block"
This reverts commit
|
|
|
|
6102310d81
|
[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify, nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places.. While i could teach EarlyCSE how to hash PHI nodes, we can't really do much (anything?) even if we find two identical PHI nodes in different basic blocks, same-BB case is the interesting one, and if we teach InstSimplify about it (which is what i wanted originally, https://reviews.llvm.org/D86530), we get EarlyCSE support for free. So i would think this is pretty uncontroversial. On vanilla llvm test-suite + RawSpeed, this has the following effects: ``` | statistic name | baseline | proposed | Δ | % | \|%\| | |----------------------------------------------------|-----------|-----------|-------:|---------:|---------:| | instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% | | asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% | | assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% | | correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% | | early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% | | early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% | | instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% | | instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% | | instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% | | instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% | | instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% | | instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% | | instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% | | simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% | | simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% | ``` I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call` transforms, and counter-intuitively results in *more* instructions total. That being said, the PHI count doesn't decrease that much, and looking at some examples, it seems at least some of them were previously getting PHI CSE'd in SimplifyCFG of all places.. I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time. As a comment in `InstCombinerImpl::visitPHINode()` already stated, there are no guarantees on the ordering of the operands of a PHI node, so if we just naively compare them, we may false-negatively say that the nodes are not equal when the only difference is operand order, which is especially important since the fold is in InstSimplify, so we can't rely on InstCombine sorting them beforehand. Fixing this for the general case is costly (geomean +0.02%), and does not appear to catch anything in test-suite, but for the same-BB case, it's trivial, so let's fix at least that. As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions this appears to cause geomean +0.03% compile time increase (regression), but geomean -0.01%..-0.04% code size decrease (improvement). |
|
|
|
2655a70a04
|
[InstCombine] After merging store into successor, queue prev. store to be visited (PR46661)
We can happen to have a situation with many stores eligible for transform, but due to our visitation order (top to bottom), when we have processed the first eligible instruction, we would not try to reprocess the previous instructions that are now also eligible. So after we've successfully merged a store that was second-to-last instruction into successor, if the now-second-to-last instruction is also a such store that is eligible, add it to worklist to be revisited. Fixes https://bugs.llvm.org/show_bug.cgi?id=46661 |
|
|
|
ef0ecb7b03
|
[NFCI][InstCombine] PR46661: multiple stores eligible for merging into successor - worklist issue
The testcase should pass with a single instcombine iteration. |