Commit Graph

393 Commits

Author SHA1 Message Date
Simon Pilgrim b11eaf5617 [DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI.
We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.
2020-11-08 13:07:45 +00:00
Florian Hahn aab71d4443 [DSE] Use same logic as legacy impl to check if free kills a location.
This patch updates DSE + MemorySSA to use the same check as the legacy
implementation to determine if a location is killed by a free call.

This changes the existing behavior so that a free does not kill
locations before the start of the freed pointer.

This should fix PR48036.
2020-10-31 20:09:25 +00:00
Evgeniy Brevnov 3d31adaec4 [DSE] Improve partial overlap detection
Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities.

Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think?

Reviewed By: fhahn, asbirlea

Differential Revision: https://reviews.llvm.org/D90371
2020-10-30 22:23:20 +07:00
Florian Hahn 05e4f7bde9 [DSE] Remove noop stores after killing stores for a MemoryDef.
Currently we fail to eliminate some noop stores if there is a kill-able
store between the starting def and the load. This is because we
eliminate noop stores first.

In practice it seems like eliminating noop stores after the main
elimination for a def covers slightly more cases.

This patch improves the number of stores slightly in 2 cases for X86 -O3
-flto

Same hash: 235 (filtered out)
Remaining: 2
Metric: dse.NumRedundantStores

Program                                          base      patch diff
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test     2.00   3.00 50.0%
 test-suite...006/453.povray/453.povray.test    18.00  21.00 16.7%

There might be other phase ordering issues, but it appears that they do
not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the
ordering later.

Partly fixes PR47887.

Reviewed By: asbirlea, zoecarver

Differential Revision: https://reviews.llvm.org/D89650
2020-10-30 09:40:15 +00:00
Florian Hahn b82f80057d [DSE] Use walker to skip noalias stores between current & clobber def.
Instead of getting the defining access we should be able to use
getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional
checks should be needed, because we only remove the starting def if it
matches the defining access of the load. All we need to worry about is
that there are no (may)alias stores between the starting def and the
load and getClobberingMemoryAccess should guarantee that.

Partly fixes PR47887.

This improves the number of redundant stores removed in some cases
(numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto
-O3).

Same hash: 226 (filtered out)
Remaining: 11
Metric: dse.NumRedundantStores

Program                                        base   patch1 diff
 test-suite...:: External/Povray/povray.test     1.00   5.00 400.0%
 test-suite...chmarks/MallocBench/gs/gs.test     1.00   3.00 200.0%
 test-suite...0/253.perlbmk/253.perlbmk.test    21.00  37.00 76.2%
 test-suite...0.perlbench/400.perlbench.test    24.00  37.00 54.2%
 test-suite.../Applications/SPASS/SPASS.test     3.00   4.00 33.3%
 test-suite...006/453.povray/453.povray.test    15.00  18.00 20.0%
 test-suite...T2006/445.gobmk/445.gobmk.test    27.00  29.00  7.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   136.00 137.00  0.7%
 test-suite.../CINT2000/176.gcc/176.gcc.test     6.00   6.00  0.0%
 test-suite.../Benchmarks/Bullet/bullet.test    NaN     3.00  nan%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    NaN     1.00  nan%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89647
2020-10-28 11:01:25 +00:00
Florian Hahn 2e58010208 [DSE] Do not scan users of memory terminators for further reads.
isMemTerminator checks if the current def is a memory terminator that
terminates the memory pointed to by DefLoc. We do not have to add any of
their users to the worklist, because the follow-on users cannot read the
memory in question.

This leads to more stores eliminated in the presence of lifetime calls.
Previously we added the users of those intrinsics to the worklist,
limiting elimination.

In terms of removed stores, this gives a nice boost on some benchmarks
(MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):

Same hash: 205 (filtered out)
Remaining: 32
Metric: dse.NumFastStores

Program                                          base   patch   diff
 test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
 test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
 test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
 test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
 test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
 test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
 test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
 test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
 test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
 test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%
2020-10-20 16:55:22 +01:00
Florian Hahn 6439fde6d4 [DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem.
This change should currently not have any impact, but guard against
further inconsistencies between MemoryLocation and function attributes.
2020-10-20 14:37:53 +01:00
Florian Hahn f5cf7f544b [DSE] Do not consider 'noop' intrinsics as read-clobbers.
isNoopIntrinsic returns true for some intrinsics that are modeled in
MemorySSA but do not actually read or write any memory and do not block
DSE. Such intrinsics should not be considered as read-clobbers.
2020-10-18 15:51:05 +01:00
Florian Hahn 51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14.
2020-10-16 09:02:53 +01:00
zoecarver 6c25816d7b [DSE] Look through memory PHI arguments when removing noop stores in MSSA.
Summary:
Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks.

Originally part of D79391.

Reviewers: fhahn, jfb, asbirlea

Differential Revision: https://reviews.llvm.org/D82588
2020-10-01 10:42:02 -07:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Florian Hahn 8f0466edc0 [DSE] Unify & fix mem terminator location checks.
When looking for memory defs killed by memory terminators the code
currently incorrectly ignores the size argument of llvm.lifetime.end.

This patch updates the code to use isMemTerminator and updates
isMemTerminator to use isOverwrite() to make sure locations that are
outside the range marked as dead by llvm.lifetime.end are not
considered. Note that isOverwrite is only used for llvm.lifetime.end,
because free-like functions make the whole underlying object dead.
2020-09-26 13:47:50 +01:00
Florian Hahn 9d172c8e9c Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.

Notable fixes fc82006331, a0017c2bc2.

This reverts commit 3a59628f3c.
2020-09-18 11:05:00 +01:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Florian Hahn f715d81c9d [DSE] Only eliminate candidates that always store the same loc.
AliasAnalysis/MemoryLocation does not account for loops. Two
MemoryLocation can be must-overwrite, even if the first one writes
multiple locations in a loop.

This patch prevents removing such stores, by only considering candidates
that are known to be loop invariant, or executed in the same BB.

Currently the invariant check is quite conservative and only considers
Alloca and Alloca-like instructions and arguments as invariant base pointers.
It also considers GEPs with all constant indices and invariant bases as
invariant.

This can be improved in the future, but the current implementation has
only minor impact on the total number of stores eliminated (25903 vs
26047 for the baseline). There are some 2-10% swings for some individual
benchmarks. In roughly half of the cases, the number of stores removed
increases actually, because we skip candidates that are unlikely to be
valid candidates early.
2020-09-14 12:06:58 +01:00
Florian Hahn e082dee2b5 [DSE] Bail out on MemoryPhis when deleting stores at end of function.
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.

This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.

The issue was reported post-commit for fb109c42d9.
2020-09-12 19:05:59 +01:00
Krzysztof Parzyszek f92908cc74 [DSE] Make sure that DSE+MSSA can handle masked stores
Differential Revision: https://reviews.llvm.org/D87414
2020-09-11 10:00:21 -05:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
Florian Hahn a5ec99da6e [DSE] Support eliminating memcpy.inline.
MemoryLocation has been taught about memcpy.inline, which means we can
get the memory locations read and written by it. This means DSE can
handle memcpy.inline
2020-09-10 13:19:25 +01:00
Florian Hahn 9969c317ff [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
2020-09-09 23:01:58 +01:00
Krzysztof Parzyszek 81ff2d30a9 [DSE] Handle masked stores 2020-09-09 13:31:31 -05:00
Florian Hahn c7b7c32f4a [DSE,MemorySSA] Increase walker limit a bit.
This slightly bumps the walker limit so that it covers more cases while
not increasing compile-time too much:
http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions
2020-09-08 14:55:46 +01:00
Florian Hahn efb8e156da [DSE,MemorySSA] Add an early check for read clobbers to traversal.
Depending on the benchmark, this early exit can save a substantial
amount of compile-time:

http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions
2020-09-07 23:22:10 +01:00
Florian Hahn 16bb71fd4f [DSE,MemorySSA] Add a few additional debug messages. 2020-09-06 20:31:00 +01:00
Florian Hahn 00eb6fef08 [DSE,MemorySSA] Check for throwing instrs between killing/killed def.
We also have to check all uses between the killing & killed def and
check if any of them is throwing.
2020-09-04 18:54:59 +01:00
Florian Hahn 86d817d7cf [DSE,MemorySSA] Skip defs without analyzable write locations.
Similar to other checks above, if there is no write location for a def,
it cannot be considered for elimination and can be skipped.
2020-08-30 21:56:25 +01:00
Florian Hahn 42c57c294d [DSE,MemorySSA] Simplify code, EarlierAccess is be a MemoryDef (NFC).
After recent changes, we return early if Current is a MemoryPhi, so
EarlierAccess can only be a MemoryDef.
2020-08-30 21:31:57 +01:00
Florian Hahn 31cdb29de4 [DSE,MemorySSA] Return early when hitting a MemoryPhi.
A MemoryPhi can never be eliminated. If we hit one, return the Phi, so
the caller can continue traversing the incoming accesses.

This saves some unnecessary read clobber checks and improves
compile-time
http://llvm-compile-time-tracker.com/compare.php?from=1ffc58b6d098ce8fa71f3a80fe75b990f633f921&to=d0fa8d1982380b57d7b6067528104bc373dbe07a&stat=instructions
2020-08-29 18:28:26 +01:00
Florian Hahn 43aa7227df [DSE,MemorySSA] Check if Current is valid for elimination first.
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.

This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.

It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.

This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.

http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86487
2020-08-28 11:19:04 +01:00
Florian Hahn bb024c3c4e [DSE,MemorySSA] Remove short-cut to check if all paths are covered.
The post-order number early continue does not work in some cases, e.g.
if a path from EarlierAccess to an exit includes a node that dominates
EarlierAccess in a cycle.

The short-cut only has very minor impact on compile-time, so it seems
straight-forward to remove it for now:

http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions

Fixes PR47285.
2020-08-27 12:42:40 +01:00
Florian Hahn e717fdb0f1 [DSE,MemorySSA] Traverse use-def chain without MemSSA Walker.
For DSE with MemorySSA it is beneficial to manually traverse the
defining access, instead of using a MemorySSA walker, so we can
better control the number of steps together with other limits and
also weed out invalid/unprofitable paths early on.

This patch requires a follow-up patch to be most effective, which I will
share soon after putting this patch up.

This temporarily XFAIL's the limit tests, because we now explore more
MemoryDefs that may not alias/clobber the killing def. This will be
improved/fixed by the follow-up patch.

This patch also renames some `Dom*` variables to `Earlier*`, because the
dominance relation is not really used/important here and potentially
confusing.

This patch allows us to aggressively cut down compile time, geomean
-O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores
removed. Subsequent patches will increase the number of removed stores
again, while keeping compile-time in check.

http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86486
2020-08-27 10:02:02 +01:00
Florian Hahn e19ef1aab5 [DSE,MemorySSA] Cache accesses with/without reachable read-clobbers.
Currently we repeatedly check the same uses for read clobbers in some
cases. We can avoid unnecessary checks by keeping track of the memory
accesses we already found read clobbers for. To do so, we just add
memory access causing read-clobbers to a set. Note that marking all
visited accesses as read-clobbers would be to pessimistic, as that might
include accesses not on any path to  the actual read clobber.

If we do not find any read-clobbers, we can add all visited instructions
to another set and use that to skip the same accesses in the next call.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D75025
2020-08-25 08:48:46 +01:00
Florian Hahn d1a1cce5b1 [DSE,MemorySSA] Do not use callCapturesBefore in isReadClobber.
Using callCapturesBefore potentially improves the precision and the
number of stores we can remove. But in practice, it seems to have very
little impact in terms of stores removed. For example, for
SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are
removed (out of ~26900 stores removed). But in terms of compile-time, it
is very expensive and the patch gives substantial compile-time
improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g
-0.39%.

http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions
2020-08-24 16:19:42 +01:00
Florian Hahn b99a5eb659 [DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed.
Avoid computing InvisibleToCallerBefore/AfterRet up front. In most
cases, this information is not really needed. Instead, introduce helper
functions to compute and cache the result on demand.

Notably, this also does not use PointerMayBeCapturedBefore for
isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as
starting instruction, making the caching ineffective. But it appears the
use of PointerMayBeCapturedBefore has very limited benefits in practice
(e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with
-O3 -flto). Refrain from using it for now, to limit-compile-time.

This gives some nice compile-time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions
2020-08-24 14:05:44 +01:00
Florian Hahn 2431b143ae [DSE,MemorySSA] Limit elimination at end of function to single UO.
Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.

In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.

We can always re-visit that in the future.
2020-08-24 13:00:17 +01:00
Florian Hahn 2843c9fe0a [DSE,MemorySSA] Keep single DL instance in DSEState (NFC).
Small cleanup, also removes one instance of getting DataLayout without
using it later.
2020-08-23 15:56:38 +01:00
Florian Hahn 5e7e2162d4 [DSE,MemorySSA] Use BatchAA for AA queries.
We can use BatchAA to avoid some repeated AA queries. We only remove
stores, so I think we will get away with using a single BatchAA instance
for the complete run.

The changes in AliasAnalysis.h mirror the changes in D85583.

The change improves compile-time by roughly 1%.
http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions

This is part of the patches to bring down compile-time to the level
referenced in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86275
2020-08-22 08:36:35 +01:00
Florian Hahn 9f7350672e [DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively.
This adds conservative handling of AtomicRMW/AtomicCmpXChg to
isDSEBarrier, similar to atomic loads and stores.
2020-08-21 10:42:42 +01:00
Florian Hahn a0e92ffd0d [DSE,MemorySSA] Split off partial tracking from isOverwite.
When traversing memory uses to look for aliasing reads/writes, we only
care about complete overwrites. This patch splits off the partial
overwrite tracking from isOverwrite This avoids some unnecessary work
when checking for read/write clobbers with MemorySSA-DSE.
isOverwrite, which skips the partial overwrite tracking.

This gives a relatively small improvement
http://llvm-compile-time-tracker.com/compare.php?from=ef2a2f77f87553a0a4a39f518eb9ac86b756bda6&to=658f3905dd96d3415f3782adc712c79fa59a4665&stat=instructions

This is part of the patches to bring down compile-time to the level
referenced in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86280
2020-08-21 09:13:59 +01:00
Florian Hahn c0cbe6453a [DSE] Remove dead argument from removePartiallyOverlappedStores (NFC).
The argument is unused and can be removed.
2020-08-19 19:33:52 +01:00
Florian Hahn 1a55fbceaa [DSE,MemorySSA] Use NumRedundantStores instead of NumNoopStores.
Legacy DSE uses NumRedundantStores, while MemorySSA DSE uses
NumNoopStores. We should just use the same counter.
2020-08-19 08:50:33 +01:00
Florian Hahn 4cc20aa743 [DSE,MemorySSA] Skip access already dominated by a killing def.
If we already found a killing def (= a def that completely overwrites
the location) that dominates an access, we can skip processing it
further.

This does not help with compile-time, but increases the number of memory
accesses we can process with the same scan budget, leading to more
stores being eliminated.

Improvements with this change

Same hash: 203 (filtered out)
Remaining: 34
Metric: dse.NumFastStores

Program                                        base    dom     diff
 test-suite...rolangs-C++/family/family.test     2.00    4.00  100.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   172.00  229.00  33.1%
 test-suite...ks/Prolangs-C/agrep/agrep.test    10.00   12.00  20.0%
 test-suite...oxyApps-C++/miniFE/miniFE.test    44.00   51.00  15.9%
 test-suite...marks/7zip/7zip-benchmark.test   1285.00 1474.00 14.7%
 test-suite...006/450.soplex/450.soplex.test   254.00  289.00  13.8%
 test-suite...006/447.dealII/447.dealII.test   2466.00 2798.00 13.5%
 test-suite...000/197.parser/197.parser.test     9.00   10.00  11.1%
 test-suite.../Benchmarks/nbench/nbench.test    85.00   91.00   7.1%
 test-suite...ce/Applications/siod/siod.test    68.00   72.00   5.9%
 test-suite...ications/JM/lencod/lencod.test   786.00  824.00   4.8%
 test-suite...6/464.h264ref/464.h264ref.test   765.00  798.00   4.3%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   105.00  109.00   3.8%
 test-suite...lications/obsequi/Obsequi.test    29.00   28.00  -3.4%
 test-suite...3.xalancbmk/483.xalancbmk.test   1322.00 1367.00  3.4%
 test-suite...chmarks/MallocBench/gs/gs.test   118.00  122.00   3.4%
 test-suite...T2006/401.bzip2/401.bzip2.test    60.00   62.00   3.3%
 test-suite...6/482.sphinx3/482.sphinx3.test    30.00   31.00   3.3%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test   862.00  887.00   2.9%
 test-suite...telecomm-gsm/telecomm-gsm.test    78.00   80.00   2.6%
 test-suite...ediabench/gsm/toast/toast.test    78.00   80.00   2.6%
 test-suite.../Applications/SPASS/SPASS.test   163.00  167.00   2.5%
 test-suite...lications/ClamAV/clamscan.test   240.00  245.00   2.1%
 test-suite...006/453.povray/453.povray.test   1392.00 1419.00  1.9%
 test-suite...000/255.vortex/255.vortex.test   211.00  215.00   1.9%
 test-suite...:: External/Povray/povray.test   1295.00 1317.00  1.7%
 test-suite...lications/sqlite3/sqlite3.test   175.00  177.00   1.1%
 test-suite...T2000/256.bzip2/256.bzip2.test    99.00  100.00   1.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   629.00  635.00   1.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test   1183.00 1194.00  0.9%
 test-suite.../CINT2000/176.gcc/176.gcc.test   647.00  653.00   0.9%
 test-suite...ications/JM/ldecod/ldecod.test   512.00  516.00   0.8%
 test-suite...0.perlbench/400.perlbench.test   1026.00 1034.00  0.8%
 test-suite...-typeset/consumer-typeset.test   1876.00 1877.00  0.1%
 Geomean difference                                             7.3%
2020-08-17 20:54:48 +01:00
Florian Hahn df4756ec6c [DSE,MemorySSA] Check for underlying objects first.
isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.

This shaves off a bit of compile-time.
2020-08-17 18:52:18 +01:00
Florian Hahn 139810449b [DSE,MemorySSA] Account for ScanLimit == 0 on entry.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.

In some cases, this brings a 2x speedup in terms of compile-time.
2020-08-17 17:55:14 +01:00
Florian Hahn 3b0878a370 [DSE,MSSA] Fix crash when using tryToMergePartialOverlappingStores.
We are re-using tryToMergePartialOverlappingStores, which requires
earlier to domiante Later. In the long run,
tryToMergeParialOverlappingStores should be re-written using MemorySSA.

Fixes PR46513.
2020-08-13 12:07:56 +01:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Matt Arsenault 023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
John Brawn 20854d85e1 [DSE,MSSA] Recognise init_trampoline in getLocForWriteEx
This fixes an instance where MemorySSA-using Dead Store Elimination is failing
to do a transformation that the non-MemorySSA-using version does.

Differential Revision: https://reviews.llvm.org/D83783
2020-07-15 12:18:58 +01:00
Florian Hahn 80970ac875 [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end).
This patch adds support for eliminating stores by free & lifetime.end
calls. We can remove stores that are not read before calling a memory
terminator and we can eliminate all stores after a memory terminator
until we see a new lifetime.start. The second case seems to not really
trigger much in practice though.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72410
2020-07-08 08:59:46 +01:00